Evaluating the three-level approach of the U-smile method for imbalanced binary classification
Barbara Więckowska,
Katarzyna B Kubiak and
Przemysław Guzik
PLOS ONE, 2025, vol. 20, issue 4, 1-30
Abstract:
Real-life binary classification problems often involve imbalanced datasets, where the majority class outnumbers the minority class. We previously developed the U-smile method, which comprises the U-smile plot and the BA, RB and I coefficients, to assess the usefulness of a new variable added to a reference prediction model and validated it under class balance. In this study, we evaluated the U-smile method under class imbalance, proposed a three-level approach of the U-smile method, and used the I coefficients as a weighting factor for point size in the U-smile plots of the BA and RB coefficients. Using real data from the Heart Disease dataset and generated random variables, we built logistic regression models to assess four new variables added to the reference model (nested setting). These models were evaluated at seven pre-defined imbalance levels of 1%, 10%, 30%, 50%, 70%, 90% and 99% of the event class. The results of the U-smile method were compared to those of certain traditional measures: Brier skill score, net reclassification index, difference in F1-score, difference in Matthews correlation coefficient, difference in the area under the receiver operating characteristic curve of the new and reference models, and the likelihood-ratio test. The reference model overfitted to the majority class at higher imbalance levels. The BA-RB-I coefficients of the U-smile method identified informative variables across the entire imbalance range. At higher imbalance levels, the U-smile method indicated both prediction improvement in the minority class (positive BA and I coefficients) and reduction in overfitting to the majority class (negative RB coefficients). The U-smile method outperformed traditional evaluation measures across most of the imbalance range. It proved highly effective in variable selection for imbalanced binary classification, making it a useful tool for real-life problems, where imbalanced datasets are prevalent.
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0321661 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 21661&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0321661
DOI: 10.1371/journal.pone.0321661
Access Statistics for this article
More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().