EconPapers    
Economics at your fingertips  
 

Clustering high‐dimensional data via feature selection

Tianqi Liu, Yu Lu, Biqing Zhu and Hongyu Zhao

Biometrics, 2023, vol. 79, issue 2, 940-950

Abstract: High‐dimensional clustering analysis is a challenging problem in statistics and machine learning, with broad applications such as the analysis of microarray data and RNA‐seq data. In this paper, we propose a new clustering procedure called spectral clustering with feature selection (SC‐FS), where we first obtain an initial estimate of labels via spectral clustering, then select a small fraction of features with the largest R‐squared with these labels, that is, the proportion of variation explained by group labels, and conduct clustering again using selected features. Under mild conditions, we prove that the proposed method identifies all informative features with high probability and achieves the minimax optimal clustering error rate for the sparse Gaussian mixture model. Applications of SC‐FS to four real‐world datasets demonstrate its usefulness in clustering high‐dimensional data.

Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://doi.org/10.1111/biom.13665

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bla:biomet:v:79:y:2023:i:2:p:940-950

Ordering information: This journal article can be ordered from
http://www.blackwell ... bs.asp?ref=0006-341X

Access Statistics for this article

More articles in Biometrics from The International Biometric Society
Bibliographic data for series maintained by Wiley Content Delivery ().

 
Page updated 2025-03-19
Handle: RePEc:bla:biomet:v:79:y:2023:i:2:p:940-950