Clustering with Minimum Spanning Trees: How Good Can It Be?
Marek Gagolewski (),
Anna Cena (),
Maciej Bartoszuk () and
Łukasz Brzozowski ()
Additional contact information
Marek Gagolewski: Systems Research Institute, Polish Academy of Sciences
Anna Cena: Warsaw University of Technology, Faculty of Mathematics and Information Science
Maciej Bartoszuk: QED Software
Łukasz Brzozowski: Warsaw University of Technology, Faculty of Mathematics and Information Science
Journal of Classification, 2025, vol. 42, issue 1, No 6, 90-112
Abstract:
Abstract Minimum spanning trees (MSTs) provide a convenient representation of datasets in numerous pattern recognition activities. Moreover, they are relatively fast to compute. In this paper, we quantify the extent to which they are meaningful in low-dimensional partitional data clustering tasks. By identifying the upper bounds for the agreement between the best (oracle) algorithm and the expert labels from a large battery of benchmark data, we discover that MST methods can be very competitive. Next, we review, study, extend, and generalise a few existing, state-of-the-art MST-based partitioning schemes. This leads to some new noteworthy approaches. Overall, the Genie and the information-theoretic methods often outperform the non-MST algorithms such as K-means, Gaussian mixtures, spectral clustering, Birch, density-based, and classical hierarchical agglomerative procedures. Nevertheless, we identify that there is still some room for improvement, and thus the development of novel algorithms is encouraged.
Keywords: Hierarchical partitional clustering; Minimum spanning tree; MST; Cluster validity measure; Single linkage; Genie algorithm; Mutual information (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s00357-024-09483-1 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:jclass:v:42:y:2025:i:1:d:10.1007_s00357-024-09483-1
Ordering information: This journal article can be ordered from
http://www.springer. ... hods/journal/357/PS2
DOI: 10.1007/s00357-024-09483-1
Access Statistics for this article
Journal of Classification is currently edited by Douglas Steinley
More articles in Journal of Classification from Springer, The Classification Society
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().