LabWAS: Novel findings and study design recommendations from a meta-analysis of clinical labs in two independent biobanks
Jeffery A Goldstein,
Joshua S Weinstock,
Lisa A Bastarache,
Daniel B Larach,
Lars G Fritsche,
Ellen M Schmidt,
Chad M Brummett,
Sachin Kheterpal,
Goncalo R Abecasis,
Joshua C Denny and
Matthew Zawistowski
PLOS Genetics, 2020, vol. 16, issue 11, 1-23
Abstract:
Phenotypes extracted from Electronic Health Records (EHRs) are increasingly prevalent in genetic studies. EHRs contain hundreds of distinct clinical laboratory test results, providing a trove of health data beyond diagnoses. Such lab data is complex and lacks a ubiquitous coding scheme, making it more challenging than diagnosis data. Here we describe the first large-scale cross-health system genome-wide association study (GWAS) of EHR-based quantitative laboratory-derived phenotypes. We meta-analyzed 70 lab traits matched between the BioVU cohort from the Vanderbilt University Health System and the Michigan Genomics Initiative (MGI) cohort from Michigan Medicine. We show high replication of known association for these traits, validating EHR-based measurements as high-quality phenotypes for genetic analysis. Notably, our analysis provides the first replication for 699 previous GWAS associations across 46 different traits. We discovered 31 novel associations at genome-wide significance for 22 distinct traits, including the first reported associations for two lab-based traits. We replicated 22 of these novel associations in an independent tranche of BioVU samples. The summary statistics for all association tests are freely available to benefit other researchers. Finally, we performed mirrored analyses in BioVU and MGI to assess competing analytic practices for EHR lab traits. We find that using the mean of all available lab measurements provides a robust summary value, but alternate summarizations can improve power in certain circumstances. This study provides a proof-of-principle for cross health system GWAS and is a framework for future studies of quantitative EHR lab traits.Author summary: Electronic Health Records (EHRs) have emerged as an abundant data source for deriving phenotypes used in genetic association studies. EHRs provide a broad range of clinical data in large health system cohorts and are readily incorporated into large-scale meta-analyses. The abundance of available data in EHRs introduces unique technical challenges, particularly longitudinal clinical lab measurements which lack the structure of more commonly used disease diagnosis codes. Conflicting strategies exist in the literature and it is not clear how portable these strategies are across health systems. In this study we performed a proof-of-principle meta-analysis of 70 clinical lab traits in two large-scale health systems: BioVU from Vanderbilt University and the Michigan Genomics Initiative from Michigan Medicine. Despite the challenges of matching labs across the two health systems, we observed a high replication rate for known genetic variants. Further, we identified 31 novel associations, 22 of which replicated in an independent BioVU cohort, indicating the potential for future meta-analyses. Finally, we explored the impact of various analytic strategies, looking for consistent effects between our two cohorts, to determine optimal strategies for future genetic analysis of EHR-derived lab traits.
Date: 2020
References: View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1009077 (text/html)
https://journals.plos.org/plosgenetics/article/fil ... 09077&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pgen00:1009077
DOI: 10.1371/journal.pgen.1009077
Access Statistics for this article
More articles in PLOS Genetics from Public Library of Science
Bibliographic data for series maintained by plosgenetics ().