Spherical k-Means Clustering

Hornik, Kurt; Feinerer, Ingo; Kober, Martin; Buchta, Christian

Spherical k-Means Clustering

Kurt Hornik, Ingo Feinerer, Martin Kober and Christian Buchta

Journal of Statistical Software, 2012, vol. 050, issue i10

Abstract: Clustering text documents is a fundamental task in modern data analysis, requiring approaches which perform well both in terms of solution quality and computational efficiency. Spherical k-means clustering is one approach to address both issues, employing cosine dissimilarities to perform prototype-based partitioning of term weight representations of the documents. This paper presents the theory underlying the standard spherical k-means problem and suitable extensions, and introduces the R extension package skmeans which provides a computational environment for spherical k-means clustering featuring several solvers: a fixed-point and genetic algorithm, and interfaces to two external solvers (CLUTO and Gmeans). Performance of these solvers is investigated by means of a large scale benchmark experiment.

Date: 2012-09-18
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (20)

Downloads: (external link)
https://www.jstatsoft.org/index.php/jss/article/view/v050i10/v50i10.pdf
https://www.jstatsoft.org/index.php/jss/article/do ... skmeans_0.2-3.tar.gz
https://www.jstatsoft.org/index.php/jss/article/do ... 0i10-replication.zip
https://www.jstatsoft.org/index.php/jss/article/do ... ks_2011.10.24.tar.gz

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:jss:jstsof:v:050:i10

DOI: 10.18637/jss.v050.i10

Access Statistics for this article

Journal of Statistical Software is currently edited by Bettina Grün, Edzer Pebesma and Achim Zeileis

More articles in Journal of Statistical Software from Foundation for Open Access Statistics
Bibliographic data for series maintained by Christopher F. Baum ().