Clustering of high-dimensional observations
Yong Wang and
Reza Modarres
Journal of Nonparametric Statistics, 2025, vol. 37, issue 2, 319-343
Abstract:
We present a novel clustering method for high-dimensional, low sample size (HDLSS) data. The method is distance-based, takes advantage of the distance concentration phenomenon and the limiting values of the dissimilarity indices to construct clusters. We describe an algorithm that orders each row of the dissimilarity matrix to estimate the change points, which define cluster boundaries. We construct an agreement matrix of the Rand indices of the row clusters. The minimum of the row sum of the agreement matrix provides us with the best clusters. We prove that the new method achieves perfect clustering as the number of features diverges for a fixed sample size. Several examples are presented to illustrate the proposed method. We compare the new method with four other clustering techniques, including high-dimensional k-means, minimal spanning tree and Hierarchical Scan. The clustering methods are applied to the Lymphoma data set.
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
http://hdl.handle.net/10.1080/10485252.2024.2378904 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:taf:gnstxx:v:37:y:2025:i:2:p:319-343
Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/GNST20
DOI: 10.1080/10485252.2024.2378904
Access Statistics for this article
Journal of Nonparametric Statistics is currently edited by Jun Shao
More articles in Journal of Nonparametric Statistics from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().