Imputing genotypes using regularized generalized linear regression models
Wong William W.L.,
Griesman Josh and
Feng Zeny Z. ()
Additional contact information
Wong William W.L.: Toronto Health Economics and Technology Assessment Collaborative, Leslie Dan Faculty of Pharmacy, University of Toronto, 6th Floor, Room 658, 144 College Street, Toronto M5S 3M2, ON, Canada
Griesman Josh: Royal College of Surgeons in Ireland, 123 St. Stephens Green, Dublin 2, Ireland
Feng Zeny Z.: Department of Mathematics and Statistics, University of Guelph, 50 Stone Road East, Guelph N1G2W1, ON, Canada
Statistical Applications in Genetics and Molecular Biology, 2014, vol. 13, issue 5, 519-529
Abstract:
As genomic sequencing technologies continue to advance, researchers are furthering their understanding of the relationships between genetic variants and expressed traits. However, missing data can significantly limit the power of a genetic study. Here, the use of a regularized generalized linear model, denoted by GLMNET, is proposed to impute missing genotypes. The method aims to address certain limitations of earlier regression approaches in regards to genotype imputation, particularly the specification of the number of neighboring SNPs to be included for imputing the missing genotype. The performance of GLMNET-based method is compared to the conventional multinomial regression method and two phase-based methods: fastPHASE and BEAGLE. Two simulation scenarios are evaluated: a sparse-missing model, and a small-panel expansion model. The sparse-missing model simulates a scenario where SNPs were missing in a random fashion across the genome. In the small-panel expansion model, a set of individuals is only genotyped at a subset of the SNPs of the large panel. Each imputation method is tested in the context of two data-sets: Canadian Holstein cattle data and human HapMap CEU data. Results show that the proposed GLMNET method outperforms the other methods in the small panel expansion scenario and fastPHASE performs slightly better than the GLMNET method in the sparse-missing scenario.
Keywords: elastic net; genotype imputation; regularized generalized linear models; single nucleotide polymorphism (SNP) (search for similar items in EconPapers)
Date: 2014
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://doi.org/10.1515/sagmb-2012-0044 (text/html)
For access to full text, subscription to the journal or payment for the individual article is required.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bpj:sagmbi:v:13:y:2014:i:5:p:11:n:1
Ordering information: This journal article can be ordered from
https://www.degruyter.com/journal/key/sagmb/html
DOI: 10.1515/sagmb-2012-0044
Access Statistics for this article
Statistical Applications in Genetics and Molecular Biology is currently edited by Michael P. H. Stumpf
More articles in Statistical Applications in Genetics and Molecular Biology from De Gruyter
Bibliographic data for series maintained by Peter Golla ().