A robust EM clustering approach: ROBEM
Yüksel Öner and
Hasan Bulut
Communications in Statistics - Theory and Methods, 2021, vol. 50, issue 19, 4587-4605
Abstract:
Cluster analysis is defined as a group of multivariate statistical methods that are used to classify identical, or similar units. As is the case with all other classical statistical methods, classical clustering analysis gives misleading results when there is an outlier in the multivariate data set. To solve this problem many approaches have been proposed. This study focuses on developing a new approach, aiming to make the expectation maximization (EM) clustering algorithm resistant to outliers. We proposed a new robust hybrid clustering algorithm called robust EM (ROBEM) to reach our aim. This algorithm combines the EM clustering algorithm with robust principal component analysis (ROBPCA) algorithm. Spatial EM algorithm was proposed as a robust EM algorithm in the literature, but our simulation results and sample data applications showed that the ROBEM algorithm was more successful than the spatial EM algorithm in terms of outlier detection rate and faulty classification rate. Moreover, the proposed algorithm ROBEM provides similar results to the other well known robust clustering algorithms, such as TCLUST and Trimmed k-Means.
Date: 2021
References: Add references at CitEc
Citations:
Downloads: (external link)
http://hdl.handle.net/10.1080/03610926.2020.1722840 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:taf:lstaxx:v:50:y:2021:i:19:p:4587-4605
Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/lsta20
DOI: 10.1080/03610926.2020.1722840
Access Statistics for this article
Communications in Statistics - Theory and Methods is currently edited by Debbie Iscoe
More articles in Communications in Statistics - Theory and Methods from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().