EconPapers    
Economics at your fingertips  
 

Integrating electronic health records and GWAS summary statistics to predict the progression of autoimmune diseases from preclinical stages

Chen Wang, Havell Markus, Avantika R. Diwadkar, Chachrit Khunsriraksakul, Laura Carrel, Bingshan Li, Xue Zhong, Xingyan Wang, Xiaowei Zhan, Galen T. Foulke, Nancy J. Olsen, Dajiang J. Liu () and Bibo Jiang ()
Additional contact information
Chen Wang: College of Medicine, Penn State University
Havell Markus: College of Medicine, Penn State University
Avantika R. Diwadkar: College of Medicine, Penn State University
Chachrit Khunsriraksakul: College of Medicine, Penn State University
Laura Carrel: College of Medicine, Penn State University
Bingshan Li: Vanderbilt University
Xue Zhong: Division of Genetic Medicine, Vanderbilt University Medical Center
Xingyan Wang: College of Medicine, Penn State University
Xiaowei Zhan: Southern Methodist University
Galen T. Foulke: College of Medicine, Penn State University
Nancy J. Olsen: College of Medicine, Penn State University
Dajiang J. Liu: College of Medicine, Penn State University
Bibo Jiang: College of Medicine, Penn State University

Nature Communications, 2025, vol. 16, issue 1, 1-17

Abstract: Abstract Autoimmune diseases often exhibit a preclinical stage before diagnosis. Electronic health record (EHR) based-biobanks contain genetic data and diagnostic information, which can identify preclinical individuals at risk for progression. Biobanks typically have small numbers of cases, which are not sufficient to construct accurate polygenic risk scores (PRS). Importantly, progression and case-control phenotypes may have shared genetic basis, which we can exploit to improve prediction accuracy. We propose a novel method Genetic Progression Score (GPS) that integrates biobank and case-control study to predict the disease progression risk. Via penalized regression, GPS incorporates PRS weights for case-control studies as prior and forces model parameters to be similar to the prior if the prior improves prediction accuracy. In simulations, GPS consistently yields better prediction accuracy than alternative strategies relying on biobank or case-control samples only and those combining biobank and case-control samples. The improvement is particularly evident when biobank sample is smaller or the genetic correlation is lower. We derive PRS for the progression from preclinical rheumatoid arthritis and systemic lupus erythematosus in the BioVU biobank and validate them in All of Us. For both diseases, GPS achieves the highest prediction $${R}^{2}$$ R 2 and the resulting PRS yields the strongest correlation with progression prevalence.

Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.nature.com/articles/s41467-024-55636-6 Abstract (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-024-55636-6

Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/

DOI: 10.1038/s41467-024-55636-6

Access Statistics for this article

Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie

More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-19
Handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-024-55636-6