Transcriptome-wide root causal inference
Eric V Strobl and
Eric R Gamazon
PLOS Computational Biology, 2025, vol. 21, issue 9, 1-35
Abstract:
Root causal genes correspond to the first gene expression levels perturbed during pathogenesis by genetic or non-genetic factors. Targeting root causal genes has the potential to alleviate disease entirely by eliminating pathology near its onset. No existing algorithm has been designed to discover root causal genes from observational data alone. We therefore propose the Transcriptome-Wide Root Causal Inference (TWRCI) algorithm that identifies root causal genes and their causal graph using a combination of genetic variant and unperturbed bulk RNA sequencing data. TWRCI uses a novel competitive regression procedure to annotate cis and trans-genetic variants to the gene expression levels they directly cause. The algorithm simultaneously determines the sequence in which gene expression changes propagate through the system to pinpoint the underlying causal graph and estimate root causal effects. TWRCI outperforms alternative approaches across a diverse group of metrics by directly targeting root causal genes while accounting for distal relations, linkage disequilibrium, patient heterogeneity and widespread pleiotropy. We demonstrate the algorithm by uncovering the root causal mechanisms of two complex diseases, which we confirm by replication using independent genome-wide summary statistics.Author summary: Many diseases progress through causal chains. The earliest step detectable in gene expression is a small set of root causal genes: expression levels that change first after genetic or non-genetic triggers. Because gene expression is relatively easy to perturb, focusing on these early changes offers a tractable route to stopping disease with a sparse set of interventions. Yet most existing tools either require expensive perturbation screens or fail to distinguish true early causes from downstream consequences. Transcriptome-Wide Root Causal Inference (TWRCI) uses widely available genotype data and bulk RNA-seq to identify these first expression events and quantify their patient-specific effects. TWRCI assigns each genetic variant to the single target it most directly influences—either a gene or the disease outcome—via a head-to-head prediction test, reconstructs the causal chain among genes, and estimates each gene’s patient-specific root causal effect, integrating genetic and non-genetic drivers into an interpretable effect size. In simulations and two diseases, TWRCI outperformed alternatives, recovered compact sets of early-acting genes consistent with known biology, detected variants that act directly on disease outside expression, and replicated across cohorts. Most variation in root causal effects was non-genetic, pointing to environmental triggers.
Date: 2025
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1013461 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 13461&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1013461
DOI: 10.1371/journal.pcbi.1013461
Access Statistics for this article
More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().