Leveraging supervised learning for functionally informed fine-mapping of cis-eQTLs identifies an additional 20,913 putative causal eQTLs
Qingbo S. Wang (),
David R. Kelley,
Jacob Ulirsch,
Masahiro Kanai,
Shuvom Sadhuka,
Ran Cui,
Carlos Albors,
Nathan Cheng,
Yukinori Okada,
Francois Aguet,
Kristin G. Ardlie,
Daniel G. MacArthur and
Hilary K. Finucane ()
Additional contact information
Qingbo S. Wang: Broad Institute of MIT and Harvard
David R. Kelley: Calico Life Sciences
Jacob Ulirsch: Broad Institute of MIT and Harvard
Masahiro Kanai: Broad Institute of MIT and Harvard
Shuvom Sadhuka: Broad Institute of MIT and Harvard
Ran Cui: Broad Institute of MIT and Harvard
Carlos Albors: Broad Institute of MIT and Harvard
Nathan Cheng: Broad Institute of MIT and Harvard
Yukinori Okada: Osaka University Graduate School of Medicine
Francois Aguet: Broad Institute of MIT and Harvard
Kristin G. Ardlie: Broad Institute of MIT and Harvard
Daniel G. MacArthur: Centre for Population Genomics, Garvan Institute of Medical Research
Hilary K. Finucane: Broad Institute of MIT and Harvard
Nature Communications, 2021, vol. 12, issue 1, 1-11
Abstract:
Abstract The large majority of variants identified by GWAS are non-coding, motivating detailed characterization of the function of non-coding variants. Experimental methods to assess variants’ effect on gene expressions in native chromatin context via direct perturbation are low-throughput. Existing high-throughput computational predictors thus have lacked large gold standard sets of regulatory variants for training and validation. Here, we leverage a set of 14,807 putative causal eQTLs in humans obtained through statistical fine-mapping, and we use 6121 features to directly train a predictor of whether a variant modifies nearby gene expression. We call the resulting prediction the expression modifier score (EMS). We validate EMS by comparing its ability to prioritize functional variants with other major scores. We then use EMS as a prior for statistical fine-mapping of eQTLs to identify an additional 20,913 putatively causal eQTLs, and we incorporate EMS into co-localization analysis to identify 310 additional candidate genes across UK Biobank phenotypes.
Date: 2021
References: Add references at CitEc
Citations: View citations in EconPapers (2)
Downloads: (external link)
https://www.nature.com/articles/s41467-021-23134-8 Abstract (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:12:y:2021:i:1:d:10.1038_s41467-021-23134-8
Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/
DOI: 10.1038/s41467-021-23134-8
Access Statistics for this article
Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie
More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().