Stable Variable Selection for High-Dimensional Genomic Data with Strong Correlations
Reetika Sarkar (),
Sithija Manage and
Xiaoli Gao
Additional contact information
Reetika Sarkar: University of North Carolina at Greensboro
Sithija Manage: Texas A&M University
Xiaoli Gao: Meta Platforms
Annals of Data Science, 2024, vol. 11, issue 4, No 2, 1139-1164
Abstract:
Abstract High-dimensional genomic data studies are often found to exhibit strong correlations, which results in instability and inconsistency in the estimates obtained using commonly used regularization approaches including the Lasso and MCP, etc. In this paper, we perform comparative study of regularization approaches for variable selection under different correlation structures and propose a two-stage procedure named rPGBS to address the issue of stable variable selection in various strong correlation settings. This approach involves repeatedly running a two-stage hierarchical approach consisting of a random pseudo-group clustering and bi-level variable selection. Extensive simulation studies and high-dimensional genomic data analysis on real datasets have demonstrated the advantage of the proposed rPGBS method over some of the most used regularization methods. In particular, rPGBS results in more stable selection of variables across a variety of correlation settings, as compared to some recent methods addressing variable selection with strong correlations: Precision Lasso (Wang et al. in Bioinformatics 35:1181–1187, 2019) and Whitening Lasso (Zhu et al. in Bioinformatics 37:2238–2244, 2021). Moreover, rPGBS has been shown to be computationally efficient across various settings.
Keywords: Bi-level sparsity; Minimax concave penalty; Stability; Strong correlation; Variable selection (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s40745-023-00481-5 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:aodasc:v:11:y:2024:i:4:d:10.1007_s40745-023-00481-5
Ordering information: This journal article can be ordered from
https://www.springer ... gement/journal/40745
DOI: 10.1007/s40745-023-00481-5
Access Statistics for this article
Annals of Data Science is currently edited by Yong Shi
More articles in Annals of Data Science from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().