EconPapers    
Economics at your fingertips  
 

Linkage Disequilibrium-Based Quality Control for Large-Scale Genetic Studies

Paul Scheet and Matthew Stephens

PLOS Genetics, 2008, vol. 4, issue 8, 1-9

Abstract: Quality control (QC) is a critical step in large-scale studies of genetic variation. While, on average, high-throughput single nucleotide polymorphism (SNP) genotyping assays are now very accurate, the errors that remain tend to cluster into a small percentage of “problem” SNPs, which exhibit unusually high error rates. Because most large-scale studies of genetic variation are searching for phenomena that are rare (e.g., SNPs associated with a phenotype), even this small percentage of problem SNPs can cause important practical problems. Here we describe and illustrate how patterns of linkage disequilibrium (LD) can be used to improve QC in large-scale, population-based studies. This approach has the advantage over existing filters (e.g., HWE or call rate) that it can actually reduce genotyping error rates by automatically correcting some genotyping errors. Applying this LD-based QC procedure to data from The International HapMap Project, we identify over 1,500 SNPs that likely have high error rates in the CHB and JPT samples and estimate corrected genotypes. Our method is implemented in the software package fastPHASE, available from the Stephens Lab website (http://stephenslab.uchicago.edu/software.html).Author Summary: In large-scale studies of population genetic data, particularly genome-wide association studies, considerable effort may be spent on quality control (QC) to ensure genotype data are accurate. Typically, QC steps are applied independently to individual marker loci, with data from suspicious loci being excluded from subsequent analyses. Here we present a new QC tool, which exploits the fact that correlation of alleles among nearby genetic loci (linkage disequilibrium; LD) provides a certain amount of redundancy in genotype information, and that high rates of genotyping error at a marker may leave their trace in unusual patterns of LD. The method (a) aids in the detection of SNP loci with possibly elevated levels of genotyping error, and (b) in some cases allows for the correction of erroneous genotype calls, thereby salvaging some of the genotype data from the QC filtering process. We confirm on data from real populations that SNPs identified by this approach do show evidence for containing actual genotyping errors, and we also examine genotype intensity plots to confirm that many individual genotypes corrected by the method do appear to be called in error. More generally, these results demonstrate the potential utility of incorporating LD information into algorithms for processing and analyzing population genotype data.

Date: 2008
References: View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1000147 (text/html)
https://journals.plos.org/plosgenetics/article/fil ... 00147&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pgen00:1000147

DOI: 10.1371/journal.pgen.1000147

Access Statistics for this article

More articles in PLOS Genetics from Public Library of Science
Bibliographic data for series maintained by plosgenetics ().

 
Page updated 2025-03-19
Handle: RePEc:plo:pgen00:1000147