Detection of rare medical events in electronic health records using machine learning: Current practices and suggestions – A scoping review

Gebeyehu, Biniyam; Kleinberg, Bennett; Van Deun, Katrijn; de Vries, Esther

Detection of rare medical events in electronic health records using machine learning: Current practices and suggestions – A scoping review

Biniyam Gebeyehu, Bennett Kleinberg, Katrijn Van Deun and Esther de Vries

PLOS ONE, 2026, vol. 21, issue 3, 1-19

Abstract: Background: Routine healthcare data are increasingly stored in electronic health records (EHRs), presenting an exciting opportunity to leverage machine learning (ML) for detecting and predicting medical events. While medical experts are optimistic about expanding its applications, several caveats exist which are often overlooked. Many medical outcomes are categorical (e.g., a diagnosis is present or absent) with categories being considerably unequal in size, which might significantly impact the performance of ML algorithms. Detecting small subgroups in EHR data, so-called anomaly detection, is an emerging approach, yet organized documentation on current practices remains scarce. This scoping review examines medical anomaly detection based on routine healthcare data stored in EHRs and formulated alternative approaches in case suboptimal practices were noticed. Methods: PubMed and Web of Science were searched up to September 5, 2024. Peer-reviewed articles and conference papers on ML-based medical anomaly detection in EHR data were included. Fifty-two study characteristics were extracted and analyzed both quantitatively and qualitatively. Results: A total of 117 studies met the inclusion criteria. The cross-study median proportion of the anomalous class was 0.079 (range 0.00045–0.23). Key details, e.g., data preprocessing actions, were often incomplete; 14.5% (n = 17) provided no information on this aspect. Only four studies reported the underlying cause of missingness before deciding how to handle it, and just three considered the clinical implications of false positives and false negatives when evaluating anomaly detection performance. Conclusion: We identified a need for greater attention in the current medical anomaly detection literature for reporting details on pre-processing, handling of missing data, and the use of performance metrics. With the increasing number of anomaly detection studies based on routine healthcare data stored in EHRs, more focus is needed on implementation and reporting practices to ensure relevance and reproducibility of future studies in this field.

Date: 2026
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0332963 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 32963&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0332963

DOI: 10.1371/journal.pone.0332963

Access Statistics for this article

More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().