Clustering Empirical Bootstrap Distribution Functions Parametrized by Galton–Watson Branching Processes
Lauri Varmann and
Helena Mouriño ()
Additional contact information
Lauri Varmann: Faculdade de Ciências, Universidade de Lisboa, Campo Grande, 1749-016 Lisboa, Portugal
Helena Mouriño: Faculdade de Ciências, Universidade de Lisboa, Campo Grande, 1749-016 Lisboa, Portugal
Mathematics, 2024, vol. 12, issue 15, 1-25
Abstract:
The nonparametric bootstrap has been used in cluster analysis for various purposes. One of those purposes is to account for sampling variability. This can be achieved by obtaining a bootstrap approximation of the sampling distribution function of the estimator of interest and then clustering those distribution functions. Although the consistency of the nonparametric bootstrap in estimating transformations of the sample mean has been known for decades, little is known about how it carries over to clustering. Here, we investigated this problem with a simulation study. We considered single-linkage agglomerative hierarchical clustering and a three-type branching process for parametrized transformations of random vectors of relative frequencies of possible types of the index case of each process. In total, there were nine factors and 216 simulation scenarios in a fully-factorial design. The ability of the bootstrap-based clustering to recover the ground truth clusterings was quantified by the adjusted transfer distance between partitions. The results showed that in the best 18 scenarios, the average value of the distance was less than 20 percent of the maximum possible distance value. We noticed that the results most notably depended on the number of retained clusters, the distribution for sampling the prevalence of types, and the sample size appearing in the denominators of relative frequency types. The comparison of the bootstrap-based clustering results with so-called uninformed random partitioning results showed that in the vast majority of scenarios considered, the bootstrap-based approach led, on average, to remarkably lower classification errors than the random partitioning.
Keywords: Galton–Watson branching process; hierarchical clustering; nonparametric bootstrap; simulation; transfer distance (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/12/15/2409/pdf (application/pdf)
https://www.mdpi.com/2227-7390/12/15/2409/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:12:y:2024:i:15:p:2409-:d:1448698
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().