EconPapers    
Economics at your fingertips  
 

Resampling methods for class imbalance in clinical prediction models: A scoping review protocol

Osama Abdelhay, Adam Shatnawi, Hassan Najadat and Taghreed Altamimi

PLOS ONE, 2025, vol. 20, issue 11, 1-11

Abstract: Introduction: Class imbalance—where clinically important “positive” cases make up less than 30% of the dataset—systematically reduces the sensitivity and fairness of medical prediction models. Although data-level techniques, such as random oversampling, random undersampling, SMOTE, and algorithm-level approaches like cost-sensitive learning, are widely used, the empirical evidence on when these corrections improve model performance remains scattered across different diseases and modelling frameworks. This protocol outlines a scoping systematic review with meta-regression that will map and quantitatively summarise 15 years of research on resampling strategies in imbalanced clinical datasets, addressing a key methodological gap in reliable medical AI. Methods and analysis: We will search MEDLINE, EMBASE, Scopus, Web of Science Core Collection, and IEEE Xplore, along with grey literature sources (medRxiv, arXiv, bioRxiv) for primary studies (2009–31 Dec 2024) that apply at least one resampling or cost-sensitive strategy to binary clinical prediction tasks with a minority-class prevalence of less than 30%. There will be no language restrictions. Two reviewers will screen records, extract data using a piloted form, and document the process in a PRISMA flow diagram. A descriptive synthesis will catalogue the clinical domain, sample size, imbalance ratio, resampling strategy, model type, and performance metrics where 10 or more studies report compatible AUCs. A random-effects mixed-effects meta-regression (logit-transformed AUC) will be used to examine the effect of moderators, including imbalance ratio, resampling strategy, model family, and sample size. Small-study effects will be assessed with funnel plots, Egger’s test, trim-and-fill, and weight-function models; influence diagnostics and leave-one-out analyses will evaluate robustness. Since this is a methodological review, formal clinical risk-of-bias tools are optional; instead, design-level screening, influence diagnostics, and sensitivity analyses will enhance transparency. Discussion: By combining a comprehensive conceptual framework with quantitative estimates, this review aims to determine when data-level versus algorithm-level balancing leads to genuine improvements in discrimination, calibration, and cost-sensitive metrics across various medical fields. The findings will help researchers select concise, evidence-based methods for addressing imbalance, inform journal and regulatory reporting standards, and identify research gaps such as the under-reporting of calibration and misclassification costs, which must be addressed before balanced models can be reliably trusted in clinical practice. Systematic review registration: INPLASY202550026.

Date: 2025
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0330050 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 30050&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0330050

DOI: 10.1371/journal.pone.0330050

Access Statistics for this article

More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().

 
Page updated 2025-11-29
Handle: RePEc:plo:pone00:0330050