Consensus clustering in Stata

Drago, Carlo

Consensus clustering in Stata

Italian Stata Users' Group Meetings 2025 from Stata Users Group

Abstract: This work considers consensus clustering in Stata, combining bootstrapped k-means with hierarchical clustering based on a coassociation matrix. The method addresses the possible inherent instability of partitioning-based clustering by aggregating results from multiple bootstrap samples, improving robustness and reproducibility. In this respect, at each iteration, k-means clustering is applied, and the results are collected in a large-scale cluster assignment matrix. A consensus matrix is then created to measure the cooccurrence of observations within the same cluster across all iterations. This matrix is transformed into a dissimilarity structure and in this way subjected to hierarchical clustering in order to obtain a final, stable partition. This framework shows how consensus clustering can be performed robustly and efficiently in Stata. It uses a combination of Stata routines, bootstrap sampling, and optimized Mata routines to compute the co-association matrix, ensuring computational efficiency. The approach is broadly applicable to clustering tasks in the social sciences, economics, epidemiology, and other fields where cluster stability is critical.

Date: 2025-10-01
References: Add references at CitEc
Citations:

Downloads: (external link)
http://repec.org/isug2025/Itay25_Drago1.pdf presentation materials (application/pdf)
Our link check indicates that this URL is bad, the error code is: 404 Not Found

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:boc:isug25:10

Access Statistics for this paper

More papers in Italian Stata Users' Group Meetings 2025 from Stata Users Group Contact information at EDIRC.
Bibliographic data for series maintained by Christopher F Baum ().