EconPapers    
Economics at your fingertips  
 

An extension of the K-means algorithm to clustering skewed data

Volodymyr Melnykov () and Xuwen Zhu
Additional contact information
Volodymyr Melnykov: University of Alabama
Xuwen Zhu: University of Louisville

Computational Statistics, 2019, vol. 34, issue 1, No 16, 373-394

Abstract: Abstract Grouping similar objects into common groups, also known as clustering, is an important problem of unsupervised machine learning. Various clustering algorithms have been proposed in literature. In recent years, the need to analyze large amounts of data has led to reconsidering some fundamental clustering procedures. One of them is the celebrated K-means algorithm popular among practitioners due to its speedy performance and appealingly intuitive construction. Unfortunately, the algorithm often shows poor performance unless data groups have spherical shapes and approximately same sizes. In many applications, this restriction is so severe that the use of the K-means algorithm becomes questionable, misleading, or simply incorrect. We propose an extension of K-means that preserves the speed and intuitive interpretation of the original algorithm while providing greater flexibility in modeling clusters. The idea of the proposed generalization relies on the exponential transformation of Manly originally designed to obtain near-normally distributed data. The suggested modification is derived and illustrated on several datasets with good results.

Keywords: Exponential transformation; CEM algorithm; Cluster analysis; Skewness (search for similar items in EconPapers)
Date: 2019
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
http://link.springer.com/10.1007/s00180-018-0821-z Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:compst:v:34:y:2019:i:1:d:10.1007_s00180-018-0821-z

Ordering information: This journal article can be ordered from
http://www.springer.com/statistics/journal/180/PS2

DOI: 10.1007/s00180-018-0821-z

Access Statistics for this article

Computational Statistics is currently edited by Wataru Sakamoto, Ricardo Cao and Jürgen Symanzik

More articles in Computational Statistics from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:compst:v:34:y:2019:i:1:d:10.1007_s00180-018-0821-z