A fast and robust Bayesian nonparametric method for prediction of complex traits using summary statistics
Geyu Zhou and
Hongyu Zhao
PLOS Genetics, 2021, vol. 17, issue 7, 1-17
Abstract:
Genetic prediction of complex traits has great promise for disease prevention, monitoring, and treatment. The development of accurate risk prediction models is hindered by the wide diversity of genetic architecture across different traits, limited access to individual level data for training and parameter tuning, and the demand for computational resources. To overcome the limitations of the most existing methods that make explicit assumptions on the underlying genetic architecture and need a separate validation data set for parameter tuning, we develop a summary statistics-based nonparametric method that does not rely on validation datasets to tune parameters. In our implementation, we refine the commonly used likelihood assumption to deal with the discrepancy between summary statistics and external reference panel. We also leverage the block structure of the reference linkage disequilibrium matrix for implementation of a parallel algorithm. Through simulations and applications to twelve traits, we show that our method is adaptive to different genetic architectures, statistically robust, and computationally efficient. Our method is available at https://github.com/eldronzhou/SDPR.Author summary: Recently there has been much interest in predicting an individual’s phenotype from genetic information, which has great promise for disease prevention, monitoring, and treatment. It has been found that there is great variation in the genetic architecture underlying different complex traits, including the number of genetic variants involved and the distribution of the effect sizes of genetic variants. How to model such genetic contribution is a key aspect for accurate prediction of complex traits. So far, most existing methods make specific assumptions about the shape of the genetic contribution. If these assumptions are not correct, the prediction accuracy might be compromised. Here we propose a method that learns the shape of the genetic contribution without making any explicit assumptions. We found that our method achieved robust performance when compared with other recently developed methods through simulation and real data analysis. Our method is also practically more feasible, since it supports the use of public summary statistics and consumes only small amount of computational resources.
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (5)
Downloads: (external link)
https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1009697 (text/html)
https://journals.plos.org/plosgenetics/article/fil ... 09697&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pgen00:1009697
DOI: 10.1371/journal.pgen.1009697
Access Statistics for this article
More articles in PLOS Genetics from Public Library of Science
Bibliographic data for series maintained by plosgenetics ().