An Improved K-means Clustering Algorithm Towards an Efficient Data-Driven Modeling
Md. Zubair,
MD. Asif Iqbal,
Avijeet Shil,
M. J. M. Chowdhury,
Mohammad Ali Moni and
Iqbal H. Sarker ()
Additional contact information
Md. Zubair: Chittagong University of Engineering & Technology
MD. Asif Iqbal: Chittagong University of Engineering & Technology
Avijeet Shil: Chittagong University of Engineering & Technology
M. J. M. Chowdhury: La Trobe University
Mohammad Ali Moni: The University of Queensland
Iqbal H. Sarker: Chittagong University of Engineering & Technology
Annals of Data Science, 2024, vol. 11, issue 5, No 2, 1525-1544
Abstract:
Abstract K-means algorithm is one of the well-known unsupervised machine learning algorithms. The algorithm typically finds out distinct non-overlapping clusters in which each point is assigned to a group. The minimum squared distance technique distributes each point to the nearest clusters or subgroups. One of the K-means algorithm’s main concerns is to find out the initial optimal centroids of clusters. It is the most challenging task to determine the optimum position of the initial clusters’ centroids at the very first iteration. This paper proposes an approach to find the optimal initial centroids efficiently to reduce the number of iterations and execution time. To analyze the effectiveness of our proposed method, we have utilized different real-world datasets to conduct experiments. We have first analyzed COVID-19 and patient datasets to show our proposed method’s efficiency. A synthetic dataset of 10M instances with 8 dimensions is also used to estimate the performance of the proposed algorithm. Experimental results show that our proposed method outperforms traditional kmeans++ and random centroids initialization methods regarding the computation time and the number of iterations.
Keywords: K-means Clustering; Principal Component Analysis; Percentile; Unsupervised Algorithm; Machine Learning; Data Science (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s40745-022-00428-2 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:aodasc:v:11:y:2024:i:5:d:10.1007_s40745-022-00428-2
Ordering information: This journal article can be ordered from
https://www.springer ... gement/journal/40745
DOI: 10.1007/s40745-022-00428-2
Access Statistics for this article
Annals of Data Science is currently edited by Yong Shi
More articles in Annals of Data Science from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().