Testing the equality of high dimensional distributions
Reza Modarres
Computational Statistics & Data Analysis, 2025, vol. 212, issue C
Abstract:
The Euclidean distance is not a suitable distance for high dimensional settings due to the distance concentration phenomenon. A novel statistic that is inspired by the interpoint distances, but avoids their computation, is proposed for comparing and visualizing high dimensional datasets. The new statistic is based on a high dimensional dissimilarity index that takes advantage of the concentration phenomenon. A simultaneous display of observations means and standard deviations that aids visualization, detection of suspect outliers, and enhances separability among the competing classes in the transformed space is discussed. The finite sample convergence of the dissimilarity indices is studied, nine statistics are compared under several distributions, and three applications are presented.
Keywords: Interpoint distance; Concentration; Dissimilarity indices (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0167947325001215
Full text for ScienceDirect subscribers only.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:eee:csdana:v:212:y:2025:i:c:s0167947325001215
DOI: 10.1016/j.csda.2025.108245
Access Statistics for this article
Computational Statistics & Data Analysis is currently edited by S.P. Azen
More articles in Computational Statistics & Data Analysis from Elsevier
Bibliographic data for series maintained by Catherine Liu ().