Evaluating the Impact of Imbalanced Data on Malaria Prediction Accuracy

Amosa, Ramoni Tirimisiyu; Abiodun, Ileladewa Adeoye; Biodun, Olorunlomerue Adam; Olatunji, Lawal Moshood; Ifeoma, Ugwu Jennifer

Evaluating the Impact of Imbalanced Data on Malaria Prediction Accuracy

Ramoni Tirimisiyu Amosa, Ileladewa Adeoye Abiodun, Olorunlomerue Adam Biodun, Lawal Moshood Olatunji and Ugwu Jennifer Ifeoma
Additional contact information
Ramoni Tirimisiyu Amosa: Department of Computer Science, School of Applied Sciences, Federal Polytechnic Ede, Osun State. Nigeria
Ileladewa Adeoye Abiodun: Department of Computer Science, School of Applied Sciences, Federal Polytechnic Ede, Osun State. Nigeria
Olorunlomerue Adam Biodun: Department of Computer Science, School of Applied Sciences, Federal Polytechnic Ede, Osun State. Nigeria
Lawal Moshood Olatunji: Department of Computer Science, School of Applied Sciences, Federal Polytechnic Ede, Osun State. Nigeri
Ugwu Jennifer Ifeoma: Department of Computer Science, School of Applied Sciences, Federal Polytechnic Ede, Osun State. Nigeria

International Journal of Research and Innovation in Applied Science, 2025, vol. 10, issue 4, 57-65

Abstract: Malaria remains a significant global health challenge, particularly in tropical and subtropical regions. Traditional methods of malaria prediction rely on historical data and basic statistical analysis, which often lack the accuracy needed for effective disease control. In recent years, machine learning (ML) techniques have emerged as powerful tools for malaria prediction, offering improved accuracy and reliability. This study evaluates the performance of different ML models including Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbors (KNN), and Logistic Regression (LR)â€”for malaria disease prediction. The dataset used consists of microscopic blood sample images categorized into parasite-infected and uninfected samples. Given the imbalance in the dataset, three data balancing techniquesâ€”oversampling, undersampling, and data augmentationâ€”were applied to enhance model performance. A comparative analysis of the models was conducted using key performance metrics, including accuracy, precision, recall, F1-score, and ROC-AUC. The results indicate that Random Forest with undersampling achieved the highest accuracy (79.07%) and ROC-AUC (90.24%), making it the most effective model. While oversampling and data augmentation improved recall, they did not significantly enhance overall performance. SVM and Logistic Regression demonstrated stable performance but lagged behind Random Forest, whereas KNN exhibited high recall (97.50%) but suffered from low accuracy due to excessive false positives. The findings suggest that undersampling, particularly with Random Forest, is the most effective approach for malaria prediction in imbalanced datasets. This study highlights the potential of machine learning in enhancing malaria diagnosis and resource allocation, offering valuable insights for disease control strategies.

Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://www.rsisinternational.org/journals/ijrias/ ... 10-issue-4/57-65.pdf (application/pdf)
https://rsisinternational.org/journals/ijrias/arti ... prediction-accuracy/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bjf:journl:v:10:y:2025:i:4:p:57-65

Access Statistics for this article

International Journal of Research and Innovation in Applied Science is currently edited by Dr. Renu Malsaria

More articles in International Journal of Research and Innovation in Applied Science from International Journal of Research and Innovation in Applied Science (IJRIAS)
Bibliographic data for series maintained by Dr. Renu Malsaria ().