Clustering Large Datasets by Merging K-Means Solutions
Volodymyr Melnykov and
Semhar Michael ()
Additional contact information
Volodymyr Melnykov: Department of Information Systems, Statistics, and Management Science at the University of Alabama
Semhar Michael: Department of Mathematics and Statistics at South Dakota State University
Journal of Classification, 2020, vol. 37, issue 1, No 7, 97-123
Abstract:
Abstract Existing clustering methods range from simple but very restrictive to complex but more flexible. The K-means algorithm is one of the most popular clustering procedures due to its computational speed and intuitive construction. Unfortunately, the application of K-means in its traditional form based on Euclidean distances is limited to cases with spherical clusters of approximately the same volume and spread of points. Recent developments in the area of merging mixture components for clustering show good promise. We propose a general framework for hierarchical merging based on pairwise overlap between components which can be readily applied in the context of the K-means algorithm to produce meaningful clusters. Such an approach preserves the main advantage of the K-means algorithm—its speed. The developed ideas are illustrated on examples, studied through simulations, and applied to the problem of digit recognition.
Keywords: K-means; Finite mixture models; Merging components; Pairwise overlap; Classification EM algorithm (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s00357-019-09314-8 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:jclass:v:37:y:2020:i:1:d:10.1007_s00357-019-09314-8
Ordering information: This journal article can be ordered from
http://www.springer. ... hods/journal/357/PS2
DOI: 10.1007/s00357-019-09314-8
Access Statistics for this article
Journal of Classification is currently edited by Douglas Steinley
More articles in Journal of Classification from Springer, The Classification Society
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().