A Prism Vote method for individualized risk prediction of traits in genotype data of Multi-population
Xiaoxuan Xia,
Yexian Zhang,
Rui Sun,
Yingying Wei,
Qi Li,
Marc Ka Chun Chong,
William Ka Kei Wu,
Benny Chung-Ying Zee,
Hua Tang and
Maggie Haitian Wang
PLOS Genetics, 2022, vol. 18, issue 10, 1-16
Abstract:
Multi-population cohorts offer unprecedented opportunities for profiling disease risk in large samples, however, heterogeneous risk effects underlying complex traits across populations make integrative prediction challenging. In this study, we propose a novel Bayesian probability framework, the Prism Vote (PV), to construct risk predictions in heterogeneous genetic data. The PV views the trait of an individual as a composite risk from subpopulations, in which stratum-specific predictors can be formed in data of more homogeneous genetic structure. Since each individual is described by a composition of subpopulation memberships, the framework enables individualized risk characterization. Simulations demonstrated that the PV framework applied with alternative prediction methods significantly improved prediction accuracy in mixed and admixed populations. The advantage of PV enlarges as genetic heterogeneity and sample size increase. In two real genome-wide association data consists of multiple populations, we showed that the framework considerably enhanced prediction accuracy of the linear mixed model in five-group cross validations. The proposed method offers a new aspect to analyze individual’s disease risk and improve accuracy for predicting complex traits in genotype data.Author summary: In this study, we developed a statistical approach to dissect and predict human complex traits using genotype data. Distinct from existing methods that focus on refining effect size of genetic factors, the proposed method, Prism Vote, improves risk prediction from the dimension of individual, such that disease probability of a subject is regarded as a composite risk shaded from multiple subpopulations, thereby drawing information from both stratum-specific estimation and individualized risk composition. We showed in simulation studies that the PV enhanced prediction performance of several base prediction models significantly, particularly when genetic heterogeneity in the data is high. We also demonstrated in real genome-wide association study data of mixed populations that the PV considerably enhanced prediction accuracy of linear mixed models for traits including body-mass index, height, hypertension, and others. The PV framework offers an effective and scalable approach to leverage subpopulation information to perform risk prediction in mixed populations.
Date: 2022
References: Add references at CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1010443 (text/html)
https://journals.plos.org/plosgenetics/article/fil ... 10443&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pgen00:1010443
DOI: 10.1371/journal.pgen.1010443
Access Statistics for this article
More articles in PLOS Genetics from Public Library of Science
Bibliographic data for series maintained by plosgenetics ().