Winner's Curse Correction and Variable Thresholding Improve Performance of Polygenic Risk Modeling Based on Genome-Wide Association Study Summary-Level Data
Jianxin Shi,
Ju-Hyun Park,
Jubao Duan,
Sonja T Berndt,
Winton Moy,
Kai Yu,
Lei Song,
William Wheeler,
Xing Hua,
Debra Silverman,
Montserrat Garcia-Closas,
Chao Agnes Hsiung,
Jonine D Figueroa,
Victoria K Cortessis,
Núria Malats,
Margaret R Karagas,
Paolo Vineis,
I-Shou Chang,
Dongxin Lin,
Baosen Zhou,
Adeline Seow,
Keitaro Matsuo,
Yun-Chul Hong,
Neil E Caporaso,
Brian Wolpin,
Eric Jacobs,
Gloria M Petersen,
Alison P Klein,
Donghui Li,
Harvey Risch,
Alan R Sanders,
Li Hsu,
Robert E Schoen,
Hermann Brenner,
(Molecular Genetics of Schizophrenia) GWAS Consortium Mgs,
GECCO (The Genetics and Epidemiology of Colorectal Cancer Consortium),
The GAME-ON/TRICL (Transdisciplinary Research in Cancer of the Lung) GWAS Consortium,
(PRostate cancer AssoCiation group To Investigate Cancer Associated aLterations) Consortium Practical,
PanScan Consortium,
The GAME-ON/ELLIPSE Consortium,
Rachael Stolzenberg-Solomon,
Pablo Gejman,
Qing Lan,
Nathaniel Rothman,
Laufey T Amundadottir,
Maria Teresa Landi,
Douglas F Levinson,
Stephen J Chanock and
Nilanjan Chatterjee
PLOS Genetics, 2016, vol. 12, issue 12, 1-24
Abstract:
Recent heritability analyses have indicated that genome-wide association studies (GWAS) have the potential to improve genetic risk prediction for complex diseases based on polygenic risk score (PRS), a simple modelling technique that can be implemented using summary-level data from the discovery samples. We herein propose modifications to improve the performance of PRS. We introduce threshold-dependent winner’s-curse adjustments for marginal association coefficients that are used to weight the single-nucleotide polymorphisms (SNPs) in PRS. Further, as a way to incorporate external functional/annotation knowledge that could identify subsets of SNPs highly enriched for associations, we propose variable thresholds for SNPs selection. We applied our methods to GWAS summary-level data of 14 complex diseases. Across all diseases, a simple winner’s curse correction uniformly led to enhancement of performance of the models, whereas incorporation of functional SNPs was beneficial only for selected diseases. Compared to the standard PRS algorithm, the proposed methods in combination led to notable gain in efficiency (25–50% increase in the prediction R2) for 5 of 14 diseases. As an example, for GWAS of type 2 diabetes, winner’s curse correction improved prediction R2 from 2.29% based on the standard PRS to 3.10% (P = 0.0017) and incorporating functional annotation data further improved R2 to 3.53% (P = 2×10−5). Our simulation studies illustrate why differential treatment of certain categories of functional SNPs, even when shown to be highly enriched for GWAS-heritability, does not lead to proportionate improvement in genetic risk-prediction because of non-uniform linkage disequilibrium structure.Author Summary: Large GWAS have identified tens or even hundreds of common SNPs significantly associated with individual complex diseases; however, these SNPs typically explain a small proportion of phenotypic variance. Recently, heritability analyses based on GWAS data suggest that common SNPs have the potential to explain substantially larger fraction of phenotypic variance and to improve the genetic risk prediction. Because of the polygenic nature, improving genetic risk prediction for complex diseases typically requires substantially increasing the sample size in the discovery set. Thus, it is crucial to develop more efficient algorithms using existing GWAS summary data. In this article, we extend the polygenic risk score (PRS) method by adjusting the marginal effect size of SNPs for winner’s curse and by incorporating external functional annotation data. Theoretical analysis and simulation studies show that the performance improvement depends on the genetic architecture of the trait, sample size of the discovery sample set and the degree of enrichment of association for SNPs annotated as “high-prior” and the linkage disequilibrium patterns of these SNPs. We applied our method to the summary data of 14 GWAS. Our method achieved 25–50% gain in efficiency (measured in the prediction R2) for 5 of 14 diseases compared to the standard PRS.
Date: 2016
References: Add references at CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1006493 (text/html)
https://journals.plos.org/plosgenetics/article/fil ... 06493&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pgen00:1006493
DOI: 10.1371/journal.pgen.1006493
Access Statistics for this article
More articles in PLOS Genetics from Public Library of Science
Bibliographic data for series maintained by plosgenetics ().