An empirical examination of classification algorithms and resampling strategies for dealing with imbalanced datasets: a comparative analysis
Himani S. Deshpande and
Leena Ragha
International Journal of Data Analysis Techniques and Strategies, 2025, vol. 17, issue 3, 238-253
Abstract:
Imbalanced datasets can lead to biased models and inaccurate predictions, thus making it a crucial issue to be addressed. This research comprehensively analyses issues, approaches and evaluation parameters to work with imbalanced dataset based machine learning models. Literature suggests that data imbalance handling methods are categorised into three broad categories namely pre-processing methods, cost-sensitive learning, and ensemble methods. Experiments are conducted to test popular classifiers in combination with three pre-processing methods namely clustered smote, random over sampling, and scaled values on seven standard imbalanced datasets. The results of study show that Random Forest classifier with Random Over Sampling pre-processing method, performed best for most of the datasets with precision values between 0.68 to 1, AUC values between 0.83-1, and prediction accuracy between 76.1-99.8%. This study highlights that the choice of the evaluation metric and the pre-processing method can have a significant impact on the performance of the classifier.
Keywords: imbalanced data; over sampling; undersampling; classification; cost sensitive; ensemble learning; feature weighing; instance weighing. (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
http://www.inderscience.com/link.php?id=148563 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:ids:injdan:v:17:y:2025:i:3:p:238-253
Access Statistics for this article
More articles in International Journal of Data Analysis Techniques and Strategies from Inderscience Enterprises Ltd
Bibliographic data for series maintained by Sarah Parker ().