Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation

Palmer, Cameron; Pe’er, Itsik

Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation

Cameron Palmer and Itsik Pe’er

PLOS Genetics, 2016, vol. 12, issue 6, 1-17

Abstract: Missing data are an unavoidable component of modern statistical genetics. Different array or sequencing technologies cover different single nucleotide polymorphisms (SNPs), leading to a complicated mosaic pattern of missingness where both individual genotypes and entire SNPs are sporadically absent. Such missing data patterns cannot be ignored without introducing bias, yet cannot be inferred exclusively from nonmissing data. In genome-wide association studies, the accepted solution to missingness is to impute missing data using external reference haplotypes. The resulting probabilistic genotypes may be analyzed in the place of genotype calls. A general-purpose paradigm, called Multiple Imputation (MI), is known to model uncertainty in many contexts, yet it is not widely used in association studies. Here, we undertake a systematic evaluation of existing imputed data analysis methods and MI. We characterize biases related to uncertainty in association studies, and find that bias is introduced both at the imputation level, when imputation algorithms generate inconsistent genotype probabilities, and at the association level, when analysis methods inadequately model genotype uncertainty. We find that MI performs at least as well as existing methods or in some cases much better, and provides a straightforward paradigm for adapting existing genotype association methods to uncertain data.Author Summary: Genetic research has been focused at analysis of datapoints that are assumed to be deterministically known. However, the majority of current, high throughput data is only probabilistically known, and proper methods for handing such uncertain genotypes are limited. Here, we build on existing theory from the field of statistics to introduce a general framework for handling probabilistic genotype data obtained through genotype imputation. This framework, called Multiple Imputation, matches or improves upon existing methods for handling uncertainty in basic analysis of genetic association. As opposed to such methods, our work furthermore extends to more advanced analysis, such as mixed-effects models, with no additional complication. Importantly, it generates posterior probabilities of association that are intrinsically weighted by the certainty of the underlying data, a feature unmatched by other existing methods. Multiple Imputation is also fully compatible with meta-analysis. Finally, our analysis of probabilistic genotype data brings into focus the accuracy and unreliability of imputation’s estimated probabilities. Taken together, these results substantially increase the utility of imputed genotypes in statistical genetics, and may have strong implications for analysis of sequencing data moving forward.

Date: 2016
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1006091 (text/html)
https://journals.plos.org/plosgenetics/article?id= ... 06091&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pgen00:1006091

DOI: 10.1371/journal.pgen.1006091

Access Statistics for this article

More articles in PLOS Genetics from Public Library of Science
Bibliographic data for series maintained by plosgenetics ().