Deep representation learning for clustering longitudinal survival data from electronic health records
Jiajun Qiu,
Yao Hu,
Li Li,
Abdullah Mesut Erzurumluoglu,
Ingrid Braenne,
Charles Whitehurst,
Jochen Schmitz,
Jatin Arora,
Boris Alexander Bartholdy,
Shrey Gandhi,
Pierre Khoueiry,
Stefanie Mueller,
Boris Noyvert,
Zhihao Ding,
Jan Nygaard Jensen and
Johann Jong ()
Additional contact information
Jiajun Qiu: Boehringer Ingelheim Pharma GmbH & Co. KG
Yao Hu: Boehringer Ingelheim Pharma GmbH & Co. KG
Li Li: Boehringer Ingelheim Pharma GmbH & Co. KG
Abdullah Mesut Erzurumluoglu: Boehringer Ingelheim Pharma GmbH & Co. KG
Ingrid Braenne: Boehringer Ingelheim Pharma GmbH & Co. KG
Charles Whitehurst: Boehringer-Ingelheim
Jochen Schmitz: Boehringer-Ingelheim
Jatin Arora: Boehringer Ingelheim Pharma GmbH & Co. KG
Boris Alexander Bartholdy: Boehringer Ingelheim Pharma GmbH & Co. KG
Shrey Gandhi: Boehringer Ingelheim Pharma GmbH & Co. KG
Pierre Khoueiry: Boehringer Ingelheim Pharma GmbH & Co. KG
Stefanie Mueller: Boehringer Ingelheim Pharma GmbH & Co. KG
Boris Noyvert: Boehringer Ingelheim Pharma GmbH & Co. KG
Zhihao Ding: Boehringer Ingelheim Pharma GmbH & Co. KG
Jan Nygaard Jensen: Boehringer Ingelheim Pharma GmbH & Co. KG
Johann Jong: Boehringer Ingelheim Pharma GmbH & Co. KG
Nature Communications, 2025, vol. 16, issue 1, 1-14
Abstract:
Abstract Precision medicine requires accurate identification of clinically relevant patient subgroups. Electronic health records provide major opportunities for leveraging machine learning approaches to uncover novel patient subgroups. However, many existing approaches fail to adequately capture complex interactions between diagnosis trajectories and disease-relevant risk events, leading to subgroups that can still display great heterogeneity in event risk and underlying molecular mechanisms. To address this challenge, we implemented VaDeSC-EHR, a transformer-based variational autoencoder for clustering longitudinal survival data as extracted from electronic health records. We show that VaDeSC-EHR outperforms baseline methods on both synthetic and real-world benchmark datasets with known ground-truth cluster labels. In an application to Crohn’s disease, VaDeSC-EHR successfully identifies four distinct subgroups with divergent diagnosis trajectories and risk profiles, revealing clinically and genetically relevant factors in Crohn’s disease. Our results show that VaDeSC-EHR can be a powerful tool for discovering novel patient subgroups in the development of precision medicine approaches.
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www.nature.com/articles/s41467-025-56625-z Abstract (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-56625-z
Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/
DOI: 10.1038/s41467-025-56625-z
Access Statistics for this article
Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie
More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().