EconPapers    
Economics at your fingertips  
 

OUCH: Oversampling and Undersampling Cannot Help Improve Accuracy in Our Bayesian Classifiers That Predict Preeclampsia

Franklin Parrales-Bravo (), Rosangela Caicedo-Quiroz (), Elena Tolozano-Benitez, Víctor Gómez-Rodríguez, Lorenzo Cevallos-Torres, Jorge Charco-Aguirre and Leonel Vasquez-Cevallos
Additional contact information
Franklin Parrales-Bravo: Grupo de Investigación en Inteligencia Artificial, Facultad de Ciencias Matemáticas y Físicas, Universidad de Guayaquil, Guayaquil 090514, Ecuador
Rosangela Caicedo-Quiroz: Centro de Estudios para el Cuidado Integral y la Promoción de la Salud, Universidad Bolivariana del Ecuador, km 5 ½ vía Durán—Yaguachi, Durán 092405, Ecuador
Elena Tolozano-Benitez: Centro de Estudios en Tecnologías Aplicadas, Universidad Bolivariana del Ecuador, km 5 ½ vía Durán—Yaguachi, Durán 092405, Ecuador
Víctor Gómez-Rodríguez: Instituto Superior Tecnológico Urdesa (ITSU), Av. Pdte. Carlos Julio Arosemena Tola km 2 ½, Guayaquil 090615, Ecuador
Lorenzo Cevallos-Torres: Grupo de Investigación en Inteligencia Artificial, Facultad de Ciencias Matemáticas y Físicas, Universidad de Guayaquil, Guayaquil 090514, Ecuador
Jorge Charco-Aguirre: Grupo de Investigación en Inteligencia Artificial, Facultad de Ciencias Matemáticas y Físicas, Universidad de Guayaquil, Guayaquil 090514, Ecuador
Leonel Vasquez-Cevallos: SIMUEES Simulation Clinic, Universidad Espíritu Santo, Samborondón 092301, Ecuador

Mathematics, 2024, vol. 12, issue 21, 1-14

Abstract: Unbalanced data can have an impact on the machine learning (ML) algorithms that build predictive models. This manuscript studies the influence of oversampling and undersampling strategies on the learning of the Bayesian classification models that predict the risk of suffering preeclampsia. Given the properties of our dataset, only the oversampling and undersampling methods that operate with numerical and categorical attributes will be taken into consideration. In particular, synthetic minority oversampling techniques for nominal and continuous data (SMOTE-NC), SMOTE—Encoded Nominal and Continuous (SMOTE-ENC), random oversampling examples (ROSE), random undersampling examples (UNDER), and random oversampling techniques (OVER) are considered. According to the results, when balancing the class in the training dataset, the accuracy percentages do not improve. However, in the test dataset, both positive and negative cases of preeclampsia were accurately classified by the models, which were built on a balanced training dataset. In contrast, models built on the imbalanced training dataset were not good at detecting positive cases of preeclampsia. We can conclude that while imbalanced training datasets can be addressed by using oversampling and undersampling techniques before building prediction models, an improvement in model accuracy is not always guaranteed. Despite this, the sensitivity and specificity percentages improve in binary classification problems in most cases, such as the one we are dealing with in this manuscript.

Keywords: preeclampsia; bayesian network classifiers; class imbalance; oversampling; undersampling; SMOTE-NC; ROSE; SMOTE-ENC (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2024
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/12/21/3351/pdf (application/pdf)
https://www.mdpi.com/2227-7390/12/21/3351/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:12:y:2024:i:21:p:3351-:d:1506782

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jmathe:v:12:y:2024:i:21:p:3351-:d:1506782