EconPapers    
Economics at your fingertips  
 

High-dimensional variable selection with the plaid mixture model for clustering

Thierry Chekouo () and Alejandro Murua ()
Additional contact information
Thierry Chekouo: University of Minnesota Duluth
Alejandro Murua: Université de Montréal

Computational Statistics, 2018, vol. 33, issue 3, No 18, 1475-1496

Abstract: Abstract With high-dimensional data, the number of covariates is considerably larger than the sample size. We propose a sound method for analyzing these data. It performs simultaneously clustering and variable selection. The method is inspired by the plaid model. It may be seen as a multiplicative mixture model that allows for overlapping clustering. Unlike conventional clustering, within this model an observation may be explained by several clusters. This characteristic makes it specially suitable for gene expression data. Parameter estimation is performed with the Monte Carlo expectation maximization algorithm and importance sampling. Using extensive simulations and comparisons with competing methods, we show the advantages of our methodology, in terms of both variable selection and clustering. An application of our approach to the gene expression data of kidney renal cell carcinoma taken from The Cancer Genome Atlas validates some previously identified cancer biomarkers.

Keywords: Classification; Model selection; Multiplicative mixture model; Monte Carlo EM; Kidney cancer genomic data (search for similar items in EconPapers)
Date: 2018
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
http://link.springer.com/10.1007/s00180-018-0818-7 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:compst:v:33:y:2018:i:3:d:10.1007_s00180-018-0818-7

Ordering information: This journal article can be ordered from
http://www.springer.com/statistics/journal/180/PS2

DOI: 10.1007/s00180-018-0818-7

Access Statistics for this article

Computational Statistics is currently edited by Wataru Sakamoto, Ricardo Cao and Jürgen Symanzik

More articles in Computational Statistics from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:compst:v:33:y:2018:i:3:d:10.1007_s00180-018-0818-7