Evaluation of the number of clusters in a data set using p-values from multiple tests of hypotheses
Dr. Soumita Modak
Communications in Statistics - Theory and Methods, 2024, vol. 53, issue 24, 8878-8889
Abstract:
This article proposes a novel, nonparametric, interpoint distance-based measure to investigate whether there exist any groups in a set of given data, and if so then, how many groups are prevailing in total. It is a cluster accuracy index useful for arbitrary-dimensional data set, in association with any clustering algorithm having the number of groups specified a priori. We perform univariate, nonparametric, multiple statistical tests of hypotheses, where as many dependent tests as the sample size are carried out using the interpoint distances. They possess p-values to be combined to reach a decision, which is taken in a step-wise process for a possible number of clusters. It reduces unnecessary computations compared with the other accuracy measures from the literature. Data study establishes the proposed index’s efficiency and superiority.
Date: 2024
References: Add references at CitEc
Citations:
Downloads: (external link)
http://hdl.handle.net/10.1080/03610926.2024.2309967 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:taf:lstaxx:v:53:y:2024:i:24:p:8878-8889
Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/lsta20
DOI: 10.1080/03610926.2024.2309967
Access Statistics for this article
Communications in Statistics - Theory and Methods is currently edited by Debbie Iscoe
More articles in Communications in Statistics - Theory and Methods from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().