Identifying Effective Algorithms and Measures for Enhanced Clustering Quality: A Comprehensive Examination of Arbitrary Decisions in Hierarchical Clustering Algorithms

Behzadidoost, Rashid; Izadkhah, Habib

Identifying Effective Algorithms and Measures for Enhanced Clustering Quality: A Comprehensive Examination of Arbitrary Decisions in Hierarchical Clustering Algorithms

Rashid Behzadidoost () and Habib Izadkhah ()
Additional contact information
Rashid Behzadidoost: University of Tabriz
Habib Izadkhah: University of Tabriz

Journal of Classification, 2025, vol. 42, issue 2, No 9, 457-489

Abstract: Abstract Hierarchical clustering algorithms are widely used in various applications to group similar samples. However, a common challenge arises during the merging process when two or more clusters have equal values, with no clear criterion to determine which clusters should be merged next. This leads to arbitrary decisions, which can negatively impact the quality of clustering results. The issue of arbitrary decisions has been highlighted in previous studies, emphasizing the need for algorithms and measures that minimize their occurrence. This study provides a comprehensive analysis of arbitrary decisions generated by nine popular hierarchical clustering algorithms across 100 measures, including similarities, distances, and entropy. In total, 737 unique combinations of clustering algorithms and measures were evaluated, many of which are novel and have not been previously explored. The results show that the Agglomerative Information Bottleneck algorithm, when paired with measures such as cross-entropy and Jensen difference, the combined algorithm with Soergel and Fidelity measures, the weighted combined algorithm with cosine similarity and fidelity measures, and the median algorithm with covariance similarity and squared chord distance measures, exhibited minimal arbitrary decisions for binary data. For non-binary data, the agglomerative information bottleneck algorithm with cross-entropy and Kullback-Leibler measures, the centroid algorithm with Hellinger and squared chord distance measures, and the median algorithm with Hellinger and Jeffries-Matusita distance measures showed fewer arbitrary decisions. This study provides valuable insights for researchers and practitioners by identifying specific clustering algorithms and measures that are less prone to arbitrary decisions, thereby enhancing the quality of clustering outcomes. Overall, this paper contributes to the field of clustering by evaluating the effectiveness of new combinations of algorithms and measures in reducing arbitrary decisions.

Keywords: Clustering; Arbitrary decisions; Hierarchial clustering algorithms (search for similar items in EconPapers)
Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s00357-025-09506-5 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:jclass:v:42:y:2025:i:2:d:10.1007_s00357-025-09506-5

Ordering information: This journal article can be ordered from
http://www.springer. ... hods/journal/357/PS2

DOI: 10.1007/s00357-025-09506-5

Access Statistics for this article

Journal of Classification is currently edited by Douglas Steinley

More articles in Journal of Classification from Springer, The Classification Society
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().