EconPapers    
Economics at your fingertips  
 

Unsupervised clustering of longitudinal clinical measurements in electronic health records

Arshiya Mariam, Hamed Javidi, Emily C Zabor, Ran Zhao, Tomas Radivoyevitch and Daniel M Rotroff

PLOS Digital Health, 2024, vol. 3, issue 10, 1-20

Abstract: Longitudinal electronic health records (EHR) can be utilized to identify patterns of disease development and progression in real-world settings. Unsupervised temporal matching algorithms are being repurposed to EHR from signal processing- and protein-sequence alignment tasks where they have shown immense promise for gaining insight into disease. The robustness of these algorithms for classifying EHR clinical data remains to be determined. Timeseries compiled from clinical measurements, such as blood pressure, have far more irregularity in sampling and missingness than the data for which these algorithms were developed, necessitating a systematic evaluation of these methods. We applied 30 state-of-the-art unsupervised machine learning algorithms to 6,912 systematically generated simulated clinical datasets across five parameters. These algorithms included eight temporal matching algorithms with fourteen partitional and eight fuzzy clustering methods. Nemenyi tests were used to determine differences in accuracy using the Adjusted Rand Index (ARI). Dynamic time warping and its lower-bound variants had the highest accuracies across all cohorts (median ARI>0.70). All 30 methods were better at discriminating classes with differences in magnitude compared to differences in trajectory shapes. Missingness impacted accuracies only when classes were different by trajectory shape. The method with the highest ARI was then used to cluster a large pediatric metabolic syndrome (MetS) cohort (N = 43,426). We identified three unique childhood BMI patterns with high average cluster consensus (>70%). The algorithm identified a cluster with consistently high BMI which had the greatest risk of MetS, consistent with prior literature (OR = 4.87, 95% CI: 3.93–6.12). While these algorithms have been shown to have similar accuracies for regular timeseries, their accuracies in clinical applications vary substantially in discriminating differences in shape and especially with moderate to high missingness (>10%). This systematic assessment also shows that the most robust algorithms tested here can derive meaningful insights from longitudinal clinical data.Author summary: Clinical data is regularly recorded in patients’ health records by healthcare institutions and is becoming increasingly available for research to identify clinically meaningful subgroups, that can help drive developments in precision medicine. Clustering methods from other domains, such as audio signal processing, are being repurposed for these tasks however, clinical data has its own unique characteristics, such as missing data and specific correlation structures, that may impact the performance of certain clustering methods. Here, using a large, simulated dataset we developed from real patient data, our objective is to establish which approaches are best at stratifying patients using longitudinal clinical data. We identified dynamic time warping (DTW) and its lower-bound variants as highly robust clustering algorithms that showed impressive performance at classifying patients based on variations in trajectory shapes and trajectory magnitudes. We also demonstrate, using a real cohort of >43,000 pediatric patients, that DTW can classify BMI trajectories to identify patients at elevated risk of developing pediatric metabolic syndrome. Our study provides insights in the robustness of algorithms and their use in identifying novel pattens in clinical domain.

Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0000628 (text/html)
https://journals.plos.org/digitalhealth/article/fi ... 00628&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pdig00:0000628

DOI: 10.1371/journal.pdig.0000628

Access Statistics for this article

More articles in PLOS Digital Health from Public Library of Science
Bibliographic data for series maintained by digitalhealth ().

 
Page updated 2025-05-31
Handle: RePEc:plo:pdig00:0000628