EconPapers    
Economics at your fingertips  
 

Determinantal consensus clustering

Serge Vicente () and Alejandro Murua-Sazo ()
Additional contact information
Serge Vicente: Université de Montréal
Alejandro Murua-Sazo: Université de Montréal

Advances in Data Analysis and Classification, 2023, vol. 17, issue 4, No 2, 829-858

Abstract: Abstract Random restart of a given algorithm produces many partitions that can be aggregated to yield a consensus clustering. Ensemble methods have been recognized as more robust approaches for data clustering than single clustering algorithms. We propose the use of determinantal point processes or DPPs for the random restart of clustering algorithms based on initial sets of center points, such as k-medoids or k-means. The relation between DPPs and kernel-based methods makes DPPs suitable to describe and quantify similarity between objects. DPPs favor diversity of the center points in initial sets, so that sets with similar points have less chance of being generated than sets with very distinct points. Most current inital sets are generated with center points sampled uniformly at random. We show through extensive simulations that, contrary to DPPs, this technique fails both to ensure diversity, and to obtain a good coverage of all data facets. The latter are two key properties that make DPPs achieve good performance. Simulations with artificial datasets and applications to real datasets show that determinantal consensus clustering outperforms consensus clusterings which are based on uniform random sampling of center points.

Keywords: Classification; Kernel-based validation index; Kernel k-means; Partitioning about medoids; Radial basis function; Repulsion; Voronoi diagram; 62H30; 62-08 (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s11634-022-00514-6 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:advdac:v:17:y:2023:i:4:d:10.1007_s11634-022-00514-6

Ordering information: This journal article can be ordered from
http://www.springer. ... ds/journal/11634/PS2

DOI: 10.1007/s11634-022-00514-6

Access Statistics for this article

Advances in Data Analysis and Classification is currently edited by H.-H. Bock, W. Gaul, A. Okada, M. Vichi and C. Weihs

More articles in Advances in Data Analysis and Classification from Springer, German Classification Society - Gesellschaft für Klassifikation (GfKl), Japanese Classification Society (JCS), Classification and Data Analysis Group of the Italian Statistical Society (CLADAG), International Federation of Classification Societies (IFCS)
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-04-12
Handle: RePEc:spr:advdac:v:17:y:2023:i:4:d:10.1007_s11634-022-00514-6