Clustering of high-dimensional observations

Wang, Yong; Modarres, Reza

Clustering of high-dimensional observations

Yong Wang and Reza Modarres

Journal of Nonparametric Statistics, 2025, vol. 37, issue 2, 319-343

Abstract: We present a novel clustering method for high-dimensional, low sample size (HDLSS) data. The method is distance-based, takes advantage of the distance concentration phenomenon and the limiting values of the dissimilarity indices to construct clusters. We describe an algorithm that orders each row of the dissimilarity matrix to estimate the change points, which define cluster boundaries. We construct an agreement matrix of the Rand indices of the row clusters. The minimum of the row sum of the agreement matrix provides us with the best clusters. We prove that the new method achieves perfect clustering as the number of features diverges for a fixed sample size. Several examples are presented to illustrate the proposed method. We compare the new method with four other clustering techniques, including high-dimensional k-means, minimal spanning tree and Hierarchical Scan. The clustering methods are applied to the Lymphoma data set.

Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
http://hdl.handle.net/10.1080/10485252.2024.2378904 (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:taf:gnstxx:v:37:y:2025:i:2:p:319-343

Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/GNST20

DOI: 10.1080/10485252.2024.2378904

Access Statistics for this article

Journal of Nonparametric Statistics is currently edited by Jun Shao

More articles in Journal of Nonparametric Statistics from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().