The use of vector bootstrapping to improve variable selection precision in Lasso models
Laurin Charles (),
Boomsma Dorret and
Lubke Gitta
Additional contact information
Laurin Charles: Integrative Epidemiology Unit, School of Social and Community Medicine, University of Bristol, Bristol, BS8 2BN, UK Department of Psychology, University of Notre Dame, Notre Dame, IN 46556, USA
Boomsma Dorret: Department of Biological Psychology, VU University Amsterdam, Amsterdam, 1081 HV, Netherlands
Lubke Gitta: Department of Psychology, University of Notre Dame, Notre Dame, IN 46556, USA Department of Biological Psychology, VU University Amsterdam, Amsterdam, 1081 HV, Netherlands
Statistical Applications in Genetics and Molecular Biology, 2016, vol. 15, issue 4, 305-320
Abstract:
The Lasso is a shrinkage regression method that is widely used for variable selection in statistical genetics. Commonly, K-fold cross-validation is used to fit a Lasso model. This is sometimes followed by using bootstrap confidence intervals to improve precision in the resulting variable selections. Nesting cross-validation within bootstrapping could provide further improvements in precision, but this has not been investigated systematically. We performed simulation studies of Lasso variable selection precision (VSP) with and without nesting cross-validation within bootstrapping. Data were simulated to represent genomic data under a polygenic model as well as under a model with effect sizes representative of typical GWAS results. We compared these approaches to each other as well as to software defaults for the Lasso. Nested cross-validation had the most precise variable selection at small effect sizes. At larger effect sizes, there was no advantage to nesting. We illustrated the nested approach with empirical data comprising SNPs and SNP-SNP interactions from the most significant SNPs in a GWAS of borderline personality symptoms. In the empirical example, we found that the default Lasso selected low-reliability SNPs and interactions which were excluded by bootstrapping.
Keywords: additive-by-additive epistasis; association; bootstrap; Lasso; polygenic model; variable selection (search for similar items in EconPapers)
Date: 2016
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://doi.org/10.1515/sagmb-2015-0043 (text/html)
For access to full text, subscription to the journal or payment for the individual article is required.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bpj:sagmbi:v:15:y:2016:i:4:p:305-320:n:3
Ordering information: This journal article can be ordered from
https://www.degruyter.com/journal/key/sagmb/html
DOI: 10.1515/sagmb-2015-0043
Access Statistics for this article
Statistical Applications in Genetics and Molecular Biology is currently edited by Michael P. H. Stumpf
More articles in Statistical Applications in Genetics and Molecular Biology from De Gruyter
Bibliographic data for series maintained by Peter Golla ().