Detection of rare medical events in electronic health records using machine learning: Current practices and suggestions – A scoping review
Biniyam Gebeyehu,
Bennett Kleinberg,
Katrijn Van Deun and
Esther de Vries
PLOS ONE, 2026, vol. 21, issue 3, 1-19
Abstract:
Background: Routine healthcare data are increasingly stored in electronic health records (EHRs), presenting an exciting opportunity to leverage machine learning (ML) for detecting and predicting medical events. While medical experts are optimistic about expanding its applications, several caveats exist which are often overlooked. Many medical outcomes are categorical (e.g., a diagnosis is present or absent) with categories being considerably unequal in size, which might significantly impact the performance of ML algorithms. Detecting small subgroups in EHR data, so-called anomaly detection, is an emerging approach, yet organized documentation on current practices remains scarce. This scoping review examines medical anomaly detection based on routine healthcare data stored in EHRs and formulated alternative approaches in case suboptimal practices were noticed. Methods: PubMed and Web of Science were searched up to September 5, 2024. Peer-reviewed articles and conference papers on ML-based medical anomaly detection in EHR data were included. Fifty-two study characteristics were extracted and analyzed both quantitatively and qualitatively. Results: A total of 117 studies met the inclusion criteria. The cross-study median proportion of the anomalous class was 0.079 (range 0.00045–0.23). Key details, e.g., data preprocessing actions, were often incomplete; 14.5% (n = 17) provided no information on this aspect. Only four studies reported the underlying cause of missingness before deciding how to handle it, and just three considered the clinical implications of false positives and false negatives when evaluating anomaly detection performance. Conclusion: We identified a need for greater attention in the current medical anomaly detection literature for reporting details on pre-processing, handling of missing data, and the use of performance metrics. With the increasing number of anomaly detection studies based on routine healthcare data stored in EHRs, more focus is needed on implementation and reporting practices to ensure relevance and reproducibility of future studies in this field.
Date: 2026
References: Add references at CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0332963 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 32963&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0332963
DOI: 10.1371/journal.pone.0332963
Access Statistics for this article
More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().