Evaluation of the number of clusters in a data set using p-values from multiple tests of hypotheses

Modak, Dr. Soumita

Evaluation of the number of clusters in a data set using p-values from multiple tests of hypotheses

Dr. Soumita Modak

Communications in Statistics - Theory and Methods, 2024, vol. 53, issue 24, 8878-8889

Abstract: This article proposes a novel, nonparametric, interpoint distance-based measure to investigate whether there exist any groups in a set of given data, and if so then, how many groups are prevailing in total. It is a cluster accuracy index useful for arbitrary-dimensional data set, in association with any clustering algorithm having the number of groups specified a priori. We perform univariate, nonparametric, multiple statistical tests of hypotheses, where as many dependent tests as the sample size are carried out using the interpoint distances. They possess p-values to be combined to reach a decision, which is taken in a step-wise process for a possible number of clusters. It reduces unnecessary computations compared with the other accuracy measures from the literature. Data study establishes the proposed index’s efficiency and superiority.

Date: 2024
References: Add references at CitEc
Citations:

Downloads: (external link)
http://hdl.handle.net/10.1080/03610926.2024.2309967 (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:taf:lstaxx:v:53:y:2024:i:24:p:8878-8889

Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/lsta20

DOI: 10.1080/03610926.2024.2309967

Access Statistics for this article

Communications in Statistics - Theory and Methods is currently edited by Debbie Iscoe

More articles in Communications in Statistics - Theory and Methods from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().