EconPapers    
Economics at your fingertips  
 

Clustering Large Datasets by Merging K-Means Solutions

Volodymyr Melnykov and Semhar Michael ()
Additional contact information
Volodymyr Melnykov: Department of Information Systems, Statistics, and Management Science at the University of Alabama
Semhar Michael: Department of Mathematics and Statistics at South Dakota State University

Journal of Classification, 2020, vol. 37, issue 1, No 7, 97-123

Abstract: Abstract Existing clustering methods range from simple but very restrictive to complex but more flexible. The K-means algorithm is one of the most popular clustering procedures due to its computational speed and intuitive construction. Unfortunately, the application of K-means in its traditional form based on Euclidean distances is limited to cases with spherical clusters of approximately the same volume and spread of points. Recent developments in the area of merging mixture components for clustering show good promise. We propose a general framework for hierarchical merging based on pairwise overlap between components which can be readily applied in the context of the K-means algorithm to produce meaningful clusters. Such an approach preserves the main advantage of the K-means algorithm—its speed. The developed ideas are illustrated on examples, studied through simulations, and applied to the problem of digit recognition.

Keywords: K-means; Finite mixture models; Merging components; Pairwise overlap; Classification EM algorithm (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s00357-019-09314-8 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:jclass:v:37:y:2020:i:1:d:10.1007_s00357-019-09314-8

Ordering information: This journal article can be ordered from
http://www.springer. ... hods/journal/357/PS2

DOI: 10.1007/s00357-019-09314-8

Access Statistics for this article

Journal of Classification is currently edited by Douglas Steinley

More articles in Journal of Classification from Springer, The Classification Society
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:jclass:v:37:y:2020:i:1:d:10.1007_s00357-019-09314-8