A fast and recursive algorithm for clustering large datasets with k-medians
Hervé Cardot,
Peggy Cénac and
Jean-Marie Monnez
Computational Statistics & Data Analysis, 2012, vol. 56, issue 6, 1434-1449
Abstract:
Clustering with fast algorithms large samples of high dimensional data is an important challenge in computational statistics. A new class of recursive stochastic gradient algorithms designed for the k-medians loss criterion is proposed. By their recursive nature, these algorithms are very fast and are well adapted to deal with large samples of data that are allowed to arrive sequentially. It is proved that the stochastic gradient algorithm converges almost surely to the set of stationary points of the underlying loss criterion. A particular attention is paid to the averaged versions which are known to have better performances. A data-driven procedure that permits a fully automatic selection of the value of the descent step is also proposed. The performance of the averaged sequential estimator is compared on a simulation study, both in terms of computation speed and accuracy of the estimations, with more classical partitioning techniques such as k-means, trimmed k-means and PAM (partitioning around medoids). Finally, this new online clustering technique is illustrated on determining television audience profiles with a sample of more than 5000 individual television audiences measured every minute over a period of 24 hours.
Keywords: Averaging; High dimensional data; k-medoids; Online clustering; Partitioning around medoids; Recursive estimators; Robbins–Monro; Stochastic approximation; Stochastic gradient (search for similar items in EconPapers)
Date: 2012
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (5)
Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0167947311004178
Full text for ScienceDirect subscribers only.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:eee:csdana:v:56:y:2012:i:6:p:1434-1449
DOI: 10.1016/j.csda.2011.11.019
Access Statistics for this article
Computational Statistics & Data Analysis is currently edited by S.P. Azen
More articles in Computational Statistics & Data Analysis from Elsevier
Bibliographic data for series maintained by Catherine Liu ().