An ensemble penalized regression method for multi-ancestry polygenic risk prediction

Zhang, Jingning; Zhan, Jianan; Jin, Jin; Ma, Cheng; Zhao, Ruzhang; O’Connell, Jared; Jiang, Yunxuan; Koelsch, Bertram L.; Zhang, Haoyu; Chatterjee, Nilanjan

An ensemble penalized regression method for multi-ancestry polygenic risk prediction

Jingning Zhang (), Jianan Zhan, Jin Jin, Cheng Ma, Ruzhang Zhao, Jared O’Connell, Yunxuan Jiang, Bertram L. Koelsch, Haoyu Zhang and Nilanjan Chatterjee ()
Additional contact information
Jingning Zhang: Johns Hopkins Bloomberg School of Public Health
Jianan Zhan: 23andMe Inc.
Jin Jin: University of Pennsylvania
Cheng Ma: University of Michigan
Ruzhang Zhao: Johns Hopkins Bloomberg School of Public Health
Jared O’Connell: 23andMe Inc.
Yunxuan Jiang: 23andMe Inc.
Bertram L. Koelsch: 23andMe Inc.
Haoyu Zhang: National Cancer Institute
Nilanjan Chatterjee: Johns Hopkins Bloomberg School of Public Health

Nature Communications, 2024, vol. 15, issue 1, 1-14

Abstract: Abstract Great efforts are being made to develop advanced polygenic risk scores (PRS) to improve the prediction of complex traits and diseases. However, most existing PRS are primarily trained on European ancestry populations, limiting their transferability to non-European populations. In this article, we propose a novel method for generating multi-ancestry Polygenic Risk scOres based on enSemble of PEnalized Regression models (PROSPER). PROSPER integrates genome-wide association studies (GWAS) summary statistics from diverse populations to develop ancestry-specific PRS with improved predictive power for minority populations. The method uses a combination of $${{{{{{\mathscr{L}}}}}}}_{1}$$ L 1 (lasso) and $${{{{{{\mathscr{L}}}}}}}_{2}$$ L 2 (ridge) penalty functions, a parsimonious specification of the penalty parameters across populations, and an ensemble step to combine PRS generated across different penalty parameters. We evaluate the performance of PROSPER and other existing methods on large-scale simulated and real datasets, including those from 23andMe Inc., the Global Lipids Genetics Consortium, and All of Us. Results show that PROSPER can substantially improve multi-ancestry polygenic prediction compared to alternative methods across a wide variety of genetic architectures. In real data analyses, for example, PROSPER increased out-of-sample prediction R2 for continuous traits by an average of 70% compared to a state-of-the-art Bayesian method (PRS-CSx) in the African ancestry population. Further, PROSPER is computationally highly scalable for the analysis of large SNP contents and many diverse populations.

Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://www.nature.com/articles/s41467-024-47357-7 Abstract (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-47357-7

Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/

DOI: 10.1038/s41467-024-47357-7

Access Statistics for this article

Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie

More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().