EconPapers    
Economics at your fingertips  
 

Feature screening in large scale cluster analysis

Trambak Banerjee, Gourab Mukherjee and Peter Radchenko

Journal of Multivariate Analysis, 2017, vol. 161, issue C, 191-212

Abstract: We propose a novel methodology for feature screening in the clustering of massive datasets, in which both the number of features and the number of observations can potentially be very large. Taking advantage of a fusion penalization based convex clustering criterion, we propose a highly scalable screening procedure that efficiently discards non-informative features by first computing a clustering score corresponding to the clustering tree constructed for each feature, and then thresholding the resulting values. We provide theoretical support for our approach by establishing uniform non-asymptotic bounds on the clustering scores of the “noise” features. These bounds imply perfect screening of non-informative features with high probability and are derived via careful analysis of the empirical processes corresponding to the clustering trees that are constructed for each of the features by the associated clustering procedure. Through extensive simulation experiments, we compare the performance of our proposed method with other screening approaches popularly used in cluster analysis and obtain encouraging results. We demonstrate empirically that our method is applicable to cluster analysis of big datasets arising in single-cell gene expression studies.

Keywords: Convex clustering; Empirical processes; High-dimensionality; Modality detection; Non-asymptotic screening rate; RNA-Seq data; Single-cell biology (search for similar items in EconPapers)
Date: 2017
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0047259X17300271
Full text for ScienceDirect subscribers only

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:eee:jmvana:v:161:y:2017:i:c:p:191-212

Ordering information: This journal article can be ordered from
http://www.elsevier.com/wps/find/supportfaq.cws_home/regional
https://shop.elsevie ... _01_ooc_1&version=01

DOI: 10.1016/j.jmva.2017.08.001

Access Statistics for this article

Journal of Multivariate Analysis is currently edited by de Leeuw, J.

More articles in Journal of Multivariate Analysis from Elsevier
Bibliographic data for series maintained by Catherine Liu ().

 
Page updated 2025-03-19
Handle: RePEc:eee:jmvana:v:161:y:2017:i:c:p:191-212