EconPapers    
Economics at your fingertips  
 

Using Genetic Distance to Infer the Accuracy of Genomic Prediction

Marco Scutari, Ian Mackay and David Balding

PLOS Genetics, 2016, vol. 12, issue 9, 1-19

Abstract: The prediction of phenotypic traits using high-density genomic data has many applications such as the selection of plants and animals of commercial interest; and it is expected to play an increasing role in medical diagnostics. Statistical models used for this task are usually tested using cross-validation, which implicitly assumes that new individuals (whose phenotypes we would like to predict) originate from the same population the genomic prediction model is trained on. In this paper we propose an approach based on clustering and resampling to investigate the effect of increasing genetic distance between training and target populations when predicting quantitative traits. This is important for plant and animal genetics, where genomic selection programs rely on the precision of predictions in future rounds of breeding. Therefore, estimating how quickly predictive accuracy decays is important in deciding which training population to use and how often the model has to be recalibrated. We find that the correlation between true and predicted values decays approximately linearly with respect to either FST or mean kinship between the training and the target populations. We illustrate this relationship using simulations and a collection of data sets from mice, wheat and human genetics.Author Summary: The availability of increasing amounts of genomic data is making the use of statistical models to predict traits of interest a mainstay of many applications in life sciences. Applications range from medical diagnostics for common and rare diseases to breeding characteristics such as disease resistance in plants and animals of commercial interest. We explored an implicit assumption of how such prediction models are often assessed: that the individuals whose traits we would like to predict originate from the same population as those that are used to train the models. This is commonly not the case, especially in the case of plants and animals that are parts of selection programs. To study this problem we proposed a model-agnostic approach to infer the accuracy of prediction models as a function of two common measures of genetic distance. Using data from plant, animal and human genetics, we find that accuracy decays approximately linearly in either of those measures. Quantifying this decay has fundamental applications in all branches of genetics, as it measures how studies generalise to different populations.

Date: 2016
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (6)

Downloads: (external link)
https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1006288 (text/html)
https://journals.plos.org/plosgenetics/article/fil ... 06288&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pgen00:1006288

DOI: 10.1371/journal.pgen.1006288

Access Statistics for this article

More articles in PLOS Genetics from Public Library of Science
Bibliographic data for series maintained by plosgenetics ().

 
Page updated 2025-03-19
Handle: RePEc:plo:pgen00:1006288