Degrees of freedom and model selection for k-means clustering
David P. Hofmeyr
Computational Statistics & Data Analysis, 2020, vol. 149, issue C
Abstract:
A thorough investigation into the model degrees of freedom in k-means clustering is conducted. An extension of Stein’s lemma is used to obtain an expression for the effective degrees of freedom in the k-means model. Approximating the degrees of freedom in practice requires simplifications of this expression, however empirical studies evince the appropriateness of the proposed approach. The practical relevance of this new degrees of freedom formulation for k-means is demonstrated through model selection using the Bayesian Information Criterion. The reliability of this method is then validated through experiments on simulated data as well as on a large collection of publicly available benchmark data sets from diverse application areas. Comparisons with popular existing techniques indicate that this approach is extremely competitive for selecting high quality clustering solutions.
Keywords: Clustering; k-means; Model selection; Cluster number determination; Degrees of freedom; Bayesian Information Criterion; Penalised likelihood (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0167947320300657
Full text for ScienceDirect subscribers only.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:eee:csdana:v:149:y:2020:i:c:s0167947320300657
DOI: 10.1016/j.csda.2020.106974
Access Statistics for this article
Computational Statistics & Data Analysis is currently edited by S.P. Azen
More articles in Computational Statistics & Data Analysis from Elsevier
Bibliographic data for series maintained by Catherine Liu ().