EconPapers    
Economics at your fingertips  
 

Some clustering-based exact distribution-free k-sample tests applicable to high dimension, low sample size data

Biplab Paul, Shyamal K. De and Anil K. Ghosh

Journal of Multivariate Analysis, 2022, vol. 190, issue C

Abstract: Testing homogeneity of k(≥2) multivariate distributions is a challenging problem in statistics, especially when the dimension of the data is much larger than the sample size. Most of the existing tests often perform poorly in this high dimension, low sample size (HDLSS) regime, and many of them cannot be used at all. In this article, we propose some nonparametric tests for this purpose. These tests have the distribution-free property in finite sample situations. They are based on a high dimensional clustering algorithm that makes a partition of the data to form a contingency table. Using the cell frequencies of that table, we construct the test statistics. We can develop tests based on a k-partition of the data or estimate the number of partitions from the data and construct tests based on it. Under appropriate regularity conditions, we prove the consistency of these tests in the HDLSS asymptotic regime. We also consider a multiscale approach, where the results for different number of partitions are aggregated judiciously. Extensive simulation study and analysis of some benchmark datasets illustrate the superiority of the proposed tests over some existing methods.

Keywords: Cluster analysis; Contingency tables; Dunn index; Generalized hypergeometric distribution; High dimensional asymptotics; Multiscale approach; Rand index; Tests of independence (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0047259X21001743
Full text for ScienceDirect subscribers only

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:eee:jmvana:v:190:y:2022:i:c:s0047259x21001743

Ordering information: This journal article can be ordered from
http://www.elsevier.com/wps/find/supportfaq.cws_home/regional
https://shop.elsevie ... _01_ooc_1&version=01

DOI: 10.1016/j.jmva.2021.104897

Access Statistics for this article

Journal of Multivariate Analysis is currently edited by de Leeuw, J.

More articles in Journal of Multivariate Analysis from Elsevier
Bibliographic data for series maintained by Catherine Liu ().

 
Page updated 2025-03-19
Handle: RePEc:eee:jmvana:v:190:y:2022:i:c:s0047259x21001743