EconPapers    
Economics at your fingertips  
 

Iterative sure independence screening EM-Bayesian LASSO algorithm for multi-locus genome-wide association studies

Cox Lwaka Tamba, Yuan-Li Ni and Yuan-Ming Zhang

PLOS Computational Biology, 2017, vol. 13, issue 1, 1-20

Abstract: Genome-wide association study (GWAS) entails examining a large number of single nucleotide polymorphisms (SNPs) in a limited sample with hundreds of individuals, implying a variable selection problem in the high dimensional dataset. Although many single-locus GWAS approaches under polygenic background and population structure controls have been widely used, some significant loci fail to be detected. In this study, we used an iterative modified-sure independence screening (ISIS) approach in reducing the number of SNPs to a moderate size. Expectation-Maximization (EM)-Bayesian least absolute shrinkage and selection operator (BLASSO) was used to estimate all the selected SNP effects for true quantitative trait nucleotide (QTN) detection. This method is referred to as ISIS EM-BLASSO algorithm. Monte Carlo simulation studies validated the new method, which has the highest empirical power in QTN detection and the highest accuracy in QTN effect estimation, and it is the fastest, as compared with efficient mixed-model association (EMMA), smoothly clipped absolute deviation (SCAD), fixed and random model circulating probability unification (FarmCPU), and multi-locus random-SNP-effect mixed linear model (mrMLM). To further demonstrate the new method, six flowering time traits in Arabidopsis thaliana were re-analyzed by four methods (New method, EMMA, FarmCPU, and mrMLM). As a result, the new method identified most previously reported genes. Therefore, the new method is a good alternative for multi-locus GWAS.Author summary: Genome-wide association study is concerned with the associations between markers and traits of interest so as to identify all the significantly associated markers. In genome-wide association studies, hundreds of thousands of markers are genotyped for several hundreds of individuals. Usually, only a very minor subset of these markers is associated with the trait. Most penalization methods fail when the number of markers is much larger than the sample size. Based on this fact, we have developed an algorithm that proceeds in two stages. In the first stage (screening), we reduced the number of markers via correlation learning to a moderate size. We then used a moderate-scale variable selection method to select variables in the reduced model. Conditional on the selected variables, we repeated the screening procedure and chose another set of variables. In the second stage (estimation), all the above-selected variables are accurately estimated in a multi-locus model. Our approach is simple, accurate in estimation, fast and shows high statistical power of detecting relevant markers on simulated data. We have also used this method to identify relevant genes in real data analysis. We recommend our approach for conducting a multi-locus genome-wide association study.

Date: 2017
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005357 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 05357&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1005357

DOI: 10.1371/journal.pcbi.1005357

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().

 
Page updated 2025-03-19
Handle: RePEc:plo:pcbi00:1005357