A new initialisation method for k-means algorithm in the clustering problem: data analysis
Abolfazl Kazemi and
Ghazaleh Khodabandehlouie
International Journal of Data Analysis Techniques and Strategies, 2018, vol. 10, issue 3, 291-304
Abstract:
Clustering is one of the most important tasks in exploratory data analysis. One of the simplest and the most widely used clustering algorithm is K-means which was proposed in 1955. K-means algorithm is conceptually simple and easy to implement. This is evidenced by hundreds of publications over the last 50 years that extend k-means in various ways. Unfortunately, because of its nature, this algorithm is very sensitive to the initial placement of the cluster centres. In order to address this problem, many initialisation methods (IMs) have been proposed. In this thesis, we first provide a historical overview of these methods. Then we present two new non-random initialisation methods for k-means algorithm. Finally, we analyse the experimental results using real datasets and then the performance of IMs is evaluated by TOPSIS multi-criteria decision-making method. Finally, we prove that not only famous k-means IMs often have poor performance but also there are in fact strong alternative approaches.
Keywords: clustering; K-means algorithm; cluster centre initialisation; sum of squared error criterion; data analysis. (search for similar items in EconPapers)
Date: 2018
References: Add references at CitEc
Citations:
Downloads: (external link)
http://www.inderscience.com/link.php?id=94127 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:ids:injdan:v:10:y:2018:i:3:p:291-304
Access Statistics for this article
More articles in International Journal of Data Analysis Techniques and Strategies from Inderscience Enterprises Ltd
Bibliographic data for series maintained by Sarah Parker ().