Efficient estimation of the number of clusters for high-dimension data
Spiridon Kasapis,
Geng Zhang,
Jonathon M Smereka and
Nickolas Vlahopoulos
The Journal of Defense Modeling and Simulation, 2025, vol. 22, issue 4, 429-441
Abstract:
The exponential growth of digital image data has given rise to the need of efficient content management and retrieval tools. Currently, there is a lack of tools for processing the collected unlabeled data in a schematic manner. K -means is one of the most widely used clustering methods and has been applied in a variety of fields, one of them being image sorting. Although a useful tool for image management, the K -means method is heavily influenced by initializations, the most important one being the need to know the number of clusters a priori. A number of different methods have been proposed for identifying the correct number of clusters for K -means, one of them being the variance ratio criterion (VRC). Despite its popularity, the VRC method comes with two very important shortcomings: it only yields good results when the data dimensionality is low and it does not scale well for a high number of clusters, making it very difficult to use in computer vision applications. We propose an extension to the VRC method that works for increased cluster number and high-dimensionality data sets and therefore is fit for image data sets.
Keywords: Clustering; K-means; number of clusters; initializations; unsupervised learning schema; computer vision; variance ratio criterion (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://journals.sagepub.com/doi/10.1177/15485129231214569 (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:sae:joudef:v:22:y:2025:i:4:p:429-441
DOI: 10.1177/15485129231214569
Access Statistics for this article
More articles in The Journal of Defense Modeling and Simulation
Bibliographic data for series maintained by SAGE Publications ().