EconPapers    
Economics at your fingertips  
 

K-bMOM: A robust Lloyd-type clustering algorithm based on bootstrap median-of-means

Camille Brunet-Saumard, Edouard Genetay and Adrien Saumard

Computational Statistics & Data Analysis, 2022, vol. 167, issue C

Abstract: The median-of-means is an estimator of the mean of a random variable that has emerged as an efficient and flexible tool to design robust learning algorithms with optimal theoretical guarantees. However, its use for the clustering task suggests dividing the dataset into blocks, which may provoke the disappearance of some clusters in some blocks and lead to bad performances. To overcome this difficulty, a procedure termed “bootstrap median-of-means” is proposed, where the blocks are generated with a replacement in the dataset. Considering the estimation of the mean of a random variable, the bootstrap median-of-means has a better breakdown point than the median-of-means if enough blocks are generated. A clustering algorithm called K-bMOM is designed, by performing Lloyd-type iterations together with the use of the bootstrap median-of-means strategy. Good performances are obtained on simulated and real-world datasets for color quantization and an emphasis is put on the benefits of our robust intialization procedure. On the theoretical side, K-bMOM is also proven to have a non-trivial probabilistic breakdown point in well-clusterizable situations.

Keywords: Bootstrap; Breakdown point; Color quantization; Adversarial contamination; Median-of-means; Robust clustering (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0167947321002048
Full text for ScienceDirect subscribers only.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:eee:csdana:v:167:y:2022:i:c:s0167947321002048

DOI: 10.1016/j.csda.2021.107370

Access Statistics for this article

Computational Statistics & Data Analysis is currently edited by S.P. Azen

More articles in Computational Statistics & Data Analysis from Elsevier
Bibliographic data for series maintained by Catherine Liu ().

 
Page updated 2025-03-19
Handle: RePEc:eee:csdana:v:167:y:2022:i:c:s0167947321002048