Can Simple Balancing Algorithms Improve School Dropout Forecasting? The Case of the State Education Network of Espírito Santo, Brazil
Guilherme Armando de Almeida Pereira () and
Kiara de Deus Demura
Additional contact information
Guilherme Armando de Almeida Pereira: Department of Economics, Federal University of Espírito Santo, Vitória 29075-910, Brazil
Kiara de Deus Demura: Education Center, Jones dos Santos Neves Institute, Vitória 29052-015, Brazil
Forecasting, 2025, vol. 7, issue 4, 1-19
Abstract:
This study evaluates the effect of simple data-level balancing techniques on predicting school dropout across all state public high schools in Espírito Santo, Brazil. We trained Logistic Regression with LASSO (LR), Random Forest (RF), and Naive Bayes (NB) models on first-quarter data from 2018–2019 and forecasted dropouts for 2020, with additional validation in 2022. Facing strong class imbalance, we compared three balancing methods—RUS, SMOTE, and ROSE—against models trained on the original data. Performance was assessed using accuracy, sensitivity, specificity, precision, F1, AUC, and G-mean. Results show that the imbalance severely harmed RF and NB trained without balancing, while Logistic Regression remained more stable. Overall, balancing techniques improved most metrics: RUS and ROSE were often superior, while SMOTE produced mixed results. Optimal configurations varied by year and metric, and RUS and ROSE made up most of the best combinations. Although most configurations benefited from balancing, some decreased performance; therefore, we recommend systematic testing of multiple balancing strategies and further research into SMOTE variants and algorithm-level approaches.
Keywords: student dropout forecasting; educational data mining; highly imbalanced classification problems; students at risk of dropping out (search for similar items in EconPapers)
JEL-codes: A1 B4 C0 C1 C2 C3 C4 C5 C8 M0 Q2 Q3 Q4 (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2571-9394/7/4/59/pdf (application/pdf)
https://www.mdpi.com/2571-9394/7/4/59/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jforec:v:7:y:2025:i:4:p:59-:d:1774285
Access Statistics for this article
Forecasting is currently edited by Ms. Joss Chen
More articles in Forecasting from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().