EconPapers    
Economics at your fingertips  
 

Inferring multimodal latent topics from electronic health records

Yue Li (), Pratheeksha Nair, Xing Han Lu, Zhi Wen, Yuening Wang, Amir Ardalan Kalantari Dehaghi, Yan Miao, Weiqi Liu, Tamas Ordog, Joanna M. Biernacka, Euijung Ryu, Janet E. Olson, Mark A. Frye, Aihua Liu, Liming Guo, Ariane Marelli, Yuri Ahuja, Jose Davila-Velderrain and Manolis Kellis ()
Additional contact information
Yue Li: McGill University
Pratheeksha Nair: McGill University
Xing Han Lu: McGill University
Zhi Wen: McGill University
Yuening Wang: McGill University
Amir Ardalan Kalantari Dehaghi: McGill University
Yan Miao: McGill University
Weiqi Liu: McGill University
Tamas Ordog: Department of Medicine, and Center for Individualized Medicine
Joanna M. Biernacka: Mayo Clinic
Euijung Ryu: Mayo Clinic
Janet E. Olson: Mayo Clinic
Mark A. Frye: Mayo Clinic
Aihua Liu: McGill Adult Unit for Congenital Heart Disease Excellence (MAUDE Unit)
Liming Guo: McGill Adult Unit for Congenital Heart Disease Excellence (MAUDE Unit)
Ariane Marelli: McGill Adult Unit for Congenital Heart Disease Excellence (MAUDE Unit)
Yuri Ahuja: Massachusetts Institute of Technology
Jose Davila-Velderrain: Massachusetts Institute of Technology
Manolis Kellis: Massachusetts Institute of Technology

Nature Communications, 2020, vol. 11, issue 1, 1-17

Abstract: Abstract Electronic health records (EHR) are rich heterogeneous collections of patient health information, whose broad adoption provides clinicians and researchers unprecedented opportunities for health informatics, disease-risk prediction, actionable clinical recommendations, and precision medicine. However, EHRs present several modeling challenges, including highly sparse data matrices, noisy irregular clinical notes, arbitrary biases in billing code assignment, diagnosis-driven lab tests, and heterogeneous data types. To address these challenges, we present MixEHR, a multi-view Bayesian topic model. We demonstrate MixEHR on MIMIC-III, Mayo Clinic Bipolar Disorder, and Quebec Congenital Heart Disease EHR datasets. Qualitatively, MixEHR disease topics reveal meaningful combinations of clinical features across heterogeneous data types. Quantitatively, we observe superior prediction accuracy of diagnostic codes and lab test imputations compared to the state-of-art methods. We leverage the inferred patient topic mixtures to classify target diseases and predict mortality of patients in critical conditions. In all comparison, MixEHR confers competitive performance and reveals meaningful disease-related topics.

Date: 2020
References: Add references at CitEc
Citations:

Downloads: (external link)
https://www.nature.com/articles/s41467-020-16378-3 Abstract (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:11:y:2020:i:1:d:10.1038_s41467-020-16378-3

Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/

DOI: 10.1038/s41467-020-16378-3

Access Statistics for this article

Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie

More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-19
Handle: RePEc:nat:natcom:v:11:y:2020:i:1:d:10.1038_s41467-020-16378-3