EconPapers    
Economics at your fingertips  
 

Estimating heritability and genetic correlations from large health datasets in the absence of genetic data

Gengjie Jia, Yu Li, Hanxin Zhang, Ishanu Chattopadhyay, Anders Boeck Jensen, David R. Blair, Lea Davis, Peter N. Robinson, Torsten Dahlén, Søren Brunak, Mikael Benson, Gustaf Edgren, Nancy J. Cox, Xin Gao and Andrey Rzhetsky ()
Additional contact information
Gengjie Jia: University of Chicago
Yu Li: Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST)
Hanxin Zhang: University of Chicago
Ishanu Chattopadhyay: University of Chicago
Anders Boeck Jensen: Institute for Next Generation Healthcare, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai
David R. Blair: University of California San Francisco
Lea Davis: Vanderbilt University
Peter N. Robinson: Jackson Laboratory for Genomic Medicine
Torsten Dahlén: Karolinska Institutet
Søren Brunak: University of Copenhagen
Mikael Benson: Linkoping University
Gustaf Edgren: Karolinska Institutet
Nancy J. Cox: Vanderbilt University
Xin Gao: Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST)
Andrey Rzhetsky: University of Chicago

Nature Communications, 2019, vol. 10, issue 1, 1-11

Abstract: Abstract Typically, estimating genetic parameters, such as disease heritability and between-disease genetic correlations, demands large datasets containing all relevant phenotypic measures and detailed knowledge of family relationships or, alternatively, genotypic and phenotypic data for numerous unrelated individuals. Here, we suggest an alternative, efficient estimation approach through the construction of two disease metrics from large health datasets: temporal disease prevalence curves and low-dimensional disease embeddings. We present eleven thousand heritability estimates corresponding to five study types: twins, traditional family studies, health records-based family studies, single nucleotide polymorphisms, and polygenic risk scores. We also compute over six hundred thousand estimates of genetic, environmental and phenotypic correlations. Furthermore, we find that: (1) disease curve shapes cluster into five general patterns; (2) early-onset diseases tend to have lower prevalence than late-onset diseases (Spearman’s ρ = 0.32, p

Date: 2019
References: Add references at CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://www.nature.com/articles/s41467-019-13455-0 Abstract (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:10:y:2019:i:1:d:10.1038_s41467-019-13455-0

Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/

DOI: 10.1038/s41467-019-13455-0

Access Statistics for this article

Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie

More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-19
Handle: RePEc:nat:natcom:v:10:y:2019:i:1:d:10.1038_s41467-019-13455-0