Economics at your fingertips  

Sure independence screening in the presence of missing data

Adriano Zanin Zambom () and Gregory J. Matthews
Additional contact information
Adriano Zanin Zambom: California State University, Northridge
Gregory J. Matthews: Loyola University Chicago

Statistical Papers, 2021, vol. 62, issue 2, No 12, 817-845

Abstract: Abstract Variable selection in ultra-high dimensional data sets is an increasingly prevalent issue with the readily available data arising from, for example, genome-wide associations studies or gene expression data. When the dimension of the feature space is exponentially larger than the sample size, it is desirable to screen out unimportant predictors in order to bring the dimension down to a moderate scale. In this paper we consider the case when observations of the predictors are missing at random. We propose performing screening using the marginal linear correlation coefficient between each predictor and the response variable accounting for the missing data using maximum likelihood estimation. This method is shown to have the sure screening property. Moreover, a novel method of screening that uses additional predictors when estimating the correlation coefficient is proposed. Simulations show that simply performing screening using pairwise complete observations is out-performed by both the proposed methods and is not recommended. Finally, the proposed methods are applied to a gene expression study on prostate cancer.

Keywords: Maximum likelihood estimator; Correlation coefficient; EM algorithm; Missing at random; Ultrahigh dimensionality (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations: Track citations by RSS feed

Downloads: (external link) Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link:

Ordering information: This journal article can be ordered from
http://www.springer. ... business/journal/362

DOI: 10.1007/s00362-019-01115-w

Access Statistics for this article

Statistical Papers is currently edited by C. Müller, W. Krämer and W.G. Müller

More articles in Statistical Papers from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

Page updated 2022-05-12
Handle: RePEc:spr:stpapr:v:62:y:2021:i:2:d:10.1007_s00362-019-01115-w