Consensus clustering in Stata
Carlo Drago
Additional contact information
Carlo Drago: Università degli Studi Niccolò Cusano
Italian Stata Users' Group Meetings 2025 from Stata Users Group
Abstract:
This work considers consensus clustering in Stata, combining bootstrapped k-means with hierarchical clustering based on a coassociation matrix. The method addresses the possible inherent instability of partitioning-based clustering by aggregating results from multiple bootstrap samples, improving robustness and reproducibility. In this respect, at each iteration, k-means clustering is applied, and the results are collected in a large-scale cluster assignment matrix. A consensus matrix is then created to measure the cooccurrence of observations within the same cluster across all iterations. This matrix is transformed into a dissimilarity structure and in this way subjected to hierarchical clustering in order to obtain a final, stable partition. This framework shows how consensus clustering can be performed robustly and efficiently in Stata. It uses a combination of Stata routines, bootstrap sampling, and optimized Mata routines to compute the co-association matrix, ensuring computational efficiency. The approach is broadly applicable to clustering tasks in the social sciences, economics, epidemiology, and other fields where cluster stability is critical.
Date: 2025-10-01
References: Add references at CitEc
Citations:
Downloads: (external link)
http://repec.org/isug2025/ presentation materials (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:boc:isug25:10
Access Statistics for this paper
More papers in Italian Stata Users' Group Meetings 2025 from Stata Users Group Contact information at EDIRC.
Bibliographic data for series maintained by Christopher F Baum ().