Statistically validated hierarchical clustering: Nested partitions in hierarchical trees
Christian Bongiorno (),
Salvatore Miccichè and
Rosario Mantegna
Additional contact information
Christian Bongiorno: MICS - Mathématiques et Informatique pour la Complexité et les Systèmes - CentraleSupélec - Université Paris-Saclay
Salvatore Miccichè: DiFC - Dipartimento di Fisica e Chimica [Palermo] - UNIPA - Università degli studi di Palermo - University of Palermo
Post-Print from HAL
Abstract:
We develop a greedy algorithm that is fast and scalable in the detection of a nested partition extracted from a dendrogram obtained from hierarchical clustering of a multivariate series. Our algorithm provides a p-value for each clade observed in the hierarchical tree. The p-value is obtained by computing a number of bootstrap replicas of the dissimilarity matrix and by performing a statistical test on each difference between the dissimilarity associated with a given clade and the dissimilarity of the clade of its parent node. We prove the efficacy of our algorithm with a set of benchmarks generated by using a hierarchical factor model. We compare the results obtained by our algorithm with those of Pvclust. Pvclust is a widely used algorithm developed with a global approach originally motivated by phylogenetic studies. In our numerical experiments we focus on the role of multiple hypothesis test correction and on the robustness of the algorithms to inaccuracy and errors of datasets. We also apply our algorithm to a reference empirical dataset. We verify that our algorithm is much faster than Pvclust algorithm and has a better scalability both in the number of elements and in the number of records of the investigated multivariate set. Our algorithm provides a hierarchically nested partition in much shorter time than currently widely used algorithms allowing to perform a statistically validated cluster analysis detection in very large systems.
Date: 2022-05
New Economics Papers: this item is included in nep-ure
Note: View the original document on HAL open archive server: https://hal.science/hal-02157744v1
References: Add references at CitEc
Citations: View citations in EconPapers (1)
Published in Physica A: Statistical Mechanics and its Applications, 2022, 593, pp.126933. ⟨10.1016/j.physa.2022.126933⟩
Downloads: (external link)
https://hal.science/hal-02157744v1/document (application/pdf)
Related works:
Journal Article: Statistically validated hierarchical clustering: Nested partitions in hierarchical trees (2022) 
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:hal:journl:hal-02157744
DOI: 10.1016/j.physa.2022.126933
Access Statistics for this paper
More papers in Post-Print from HAL
Bibliographic data for series maintained by CCSD ().