ProxECAT: Proxy External Controls Association Test. A new case-control gene region association test using allele frequencies from public controls
Audrey E Hendricks,
Stephen C Billups,
Hamish N C Pike,
I Sadaf Farooqi,
Eleftheria Zeggini,
Stephanie A Santorico,
Inês Barroso and
Josée Dupuis
PLOS Genetics, 2018, vol. 14, issue 10, 1-14
Abstract:
A primary goal of the recent investment in sequencing is to detect novel genetic associations in health and disease improving the development of treatments and playing a critical role in precision medicine. While this investment has resulted in an enormous total number of sequenced genomes, individual studies of complex traits and diseases are often smaller and underpowered to detect rare variant genetic associations. Existing genetic resources such as the Exome Aggregation Consortium (>60,000 exomes) and the Genome Aggregation Database (~140,000 sequenced samples) have the potential to be used as controls in these studies. Fully utilizing these and other existing sequencing resources may increase power and could be especially useful in studies where resources to sequence additional samples are limited. However, to date, these large, publicly available genetic resources remain underutilized, or even misused, in large part due to the lack of statistical methods that can appropriately use this summary level data. Here, we present a new method to incorporate external controls in case-control analysis called ProxECAT (Proxy External Controls Association Test). ProxECAT estimates enrichment of rare variants within a gene region using internally sequenced cases and external controls. We evaluated ProxECAT in simulations and empirical analyses of obesity cases using both low-depth of coverage (7x) whole-genome sequenced controls and ExAC as controls. We find that ProxECAT maintains the expected type I error rate with increased power as the number of external controls increases. With an accompanying R package, ProxECAT enables the use of publicly available allele frequencies as external controls in case-control analysis.Author summary: Recent investments have produced sequence data on millions of people with the number of sequenced individuals continuing to grow. Although large sequencing studies exist, most sequencing data is gathered and processed in much smaller units of hundreds to thousands of samples. These silos of data result in underpowered studies for rare-variant association of complex diseases. Existing genetic resources such as the Exome Aggregation Consortium (>60,000 exomes) and the Genome Aggregation Database (~140,000 sequenced samples) have the potential to be used as controls in rare variant studies of complex diseases and traits. However, to date, these large, publicly available genetic resources remain underutilized, or even misused, in part due to the high potential for bias caused by differences in sequencing technology and processing. Here we present a new method, Proxy External Controls Association Test (ProxECAT), to integrate sequencing data from different, previously incompatible sources. ProxECAT provides a robust approach to using publicly available sequencing data enabling case-control analysis when no or limited internal controls exist. Further, ProxECAT’s motivating insight, that readily available but often discarded information can be used as a proxy to adjust for differences in data generation, may motivate further method development in other big data technologies and platforms.
Date: 2018
References: View complete reference list from CitEc
Citations: View citations in EconPapers (2)
Downloads: (external link)
https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1007591 (text/html)
https://journals.plos.org/plosgenetics/article/fil ... 07591&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pgen00:1007591
DOI: 10.1371/journal.pgen.1007591
Access Statistics for this article
More articles in PLOS Genetics from Public Library of Science
Bibliographic data for series maintained by plosgenetics ().