Joint Modelling of Confounding Factors and Prominent Genetic Regulators Provides Increased Accuracy in Genetical Genomics Studies
Nicoló Fusi,
Oliver Stegle and
Neil D Lawrence
PLOS Computational Biology, 2012, vol. 8, issue 1, 1-9
Abstract:
Expression quantitative trait loci (eQTL) studies are an integral tool to investigate the genetic component of gene expression variation. A major challenge in the analysis of such studies are hidden confounding factors, such as unobserved covariates or unknown subtle environmental perturbations. These factors can induce a pronounced artifactual correlation structure in the expression profiles, which may create spurious false associations or mask real genetic association signals. Here, we report PANAMA (Probabilistic ANAlysis of genoMic dAta), a novel probabilistic model to account for confounding factors within an eQTL analysis. In contrast to previous methods, PANAMA learns hidden factors jointly with the effect of prominent genetic regulators. As a result, this new model can more accurately distinguish true genetic association signals from confounding variation. We applied our model and compared it to existing methods on different datasets and biological systems. PANAMA consistently performs better than alternative methods, and finds in particular substantially more trans regulators. Importantly, our approach not only identifies a greater number of associations, but also yields hits that are biologically more plausible and can be better reproduced between independent studies. A software implementation of PANAMA is freely available online at http://ml.sheffield.ac.uk/qtl/. Author Summary: The computational analysis of genetical genomics studies is challenged by confounding variation that is unrelated to the genetic factors of interest. Several approaches to account for these confounding factors have been proposed, greatly increasing the sensitivity in recovering direct genetic (cis) associations between variable genetic loci and the expression levels of individual genes. Crucially, these existing techniques largely rely on the true association signals being orthogonal to the confounding variation. Here, we show that when studying indirect (trans) genetic effects, for example from master regulators, their association signals can overlap with confounding factors estimated using existing methods. This technical overlap can lead to overcorrection, erroneously explaining away true associations as confounders. To address these shortcomings, we propose PANAMA, a model that jointly learns hidden factors while accounting for the effect of selected genetic regulators. In applications to several studies, PANAMA is more accurate than existing methods in recovering the hidden confounding factors. As a result, we find an increase in the statistical power for direct (cis) and indirect (trans) associations. Most strikingly on yeast, PANAMA not only finds additional associations but also identifies master regulators that can be better reproduced between independent studies.
Date: 2012
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (3)
Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002330 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 02330&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1002330
DOI: 10.1371/journal.pcbi.1002330
Access Statistics for this article
More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().