Balancing effort and benefit of K-means clustering algorithms in Big Data realms
Joaquín Pérez-Ortega,
Nelva Nely Almanza-Ortega and
David Romero
PLOS ONE, 2018, vol. 13, issue 9, 1-19
Abstract:
In this paper we propose a criterion to balance the processing time and the solution quality of k-means cluster algorithms when applied to instances where the number n of objects is big. The majority of the known strategies aimed to improve the performance of k-means algorithms are related to the initialization or classification steps. In contrast, our criterion applies in the convergence step, namely, the process stops whenever the number of objects that change their assigned cluster at any iteration is lower than a given threshold. Through computer experimentation with synthetic and real instances, we found that a threshold close to 0.03n involves a decrease in computing time of about a factor 4/100, yielding solutions whose quality reduces by less than two percent. These findings naturally suggest the usefulness of our criterion in Big Data realms.
Date: 2018
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0201874 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 01874&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0201874
DOI: 10.1371/journal.pone.0201874
Access Statistics for this article
More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().