EconPapers    
Economics at your fingertips  
 

RSKC: An R Package for a Robust and Sparse K-Means Clustering Algorithm

Yumi Kondo, Matias Salibian-Barrera and Ruben Zamar

Journal of Statistical Software, 2016, vol. 072, issue i05

Abstract: Witten and Tibshirani (2010) proposed an algorithim to simultaneously find clusters and select clustering variables, called sparse K-means (SK-means). SK-means is particularly useful when the dataset has a large fraction of noise variables (that is, variables without useful information to separate the clusters). SK-means works very well on clean and complete data but cannot handle outliers nor missing data. To remedy these problems we introduce a new robust and sparse K-means clustering algorithm implemented in the R package RSKC. We demonstrate the use of our package on four datasets. We also conduct a Monte Carlo study to compare the performances of RSK-means and SK-means regarding the selection of important variables and identification of clusters. Our simulation study shows that RSK-means performs well on clean data and better than SK-means and other competitors on outlier-contaminated data.

Date: 2016-08-28
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
https://www.jstatsoft.org/index.php/jss/article/view/v072i05/v72i05.pdf
https://www.jstatsoft.org/index.php/jss/article/do ... 05/RSKC_2.4.2.tar.gz
https://www.jstatsoft.org/index.php/jss/article/do ... 2i05/v72i05-data.zip
https://www.jstatsoft.org/index.php/jss/article/do ... ile/v072i05/v72i05.R

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:jss:jstsof:v:072:i05

DOI: 10.18637/jss.v072.i05

Access Statistics for this article

Journal of Statistical Software is currently edited by Bettina Grün, Edzer Pebesma and Achim Zeileis

More articles in Journal of Statistical Software from Foundation for Open Access Statistics
Bibliographic data for series maintained by Christopher F. Baum ().

 
Page updated 2025-03-19
Handle: RePEc:jss:jstsof:v:072:i05