School dropout prediction and feature importance exploration in Malawi using household panel data: machine learning approach
Hazal Colak Oz (),
Çiçek Güven () and
Gonzalo Nápoles ()
Additional contact information
Hazal Colak Oz: Development Analytics
Çiçek Güven: Tilburg University
Gonzalo Nápoles: Tilburg University
Journal of Computational Social Science, 2023, vol. 6, issue 1, No 7, 245-287
Abstract:
Abstract Designing early warning systems through machine learning (ML) models to identify students at risk of dropout can improve targeting mechanisms and lead to efficient social policy interventions in education. School dropout is a culmination of various factors that drive children to leave school, and timely policy responses are most needed to address these underlying factors and improve school retention of children over time. However, applying ML approaches to school dropout prediction is an important challenge, especially in low-income countries, where data collection and management systems are relatively more prone to financial and technical constraints. For this reason, this study suggests using already collected household panel data to predict the probability of school dropout and explore feature importance for primary school children in Malawi through ML models. A rich set of variables is obtained in this study from the household data and used to build Random Forest (RF), least absolute shrinkage and selection operator (LASSO), Ridge and multilayer neural network (MNN) models. The study further explores how performance metrics differ when we embed the training samples' weights representing frequency in sampling design into the cost function of these ML models to discuss the implications of using household data in computational social science. LASSO and MNN models trained with sample weights become more prominent due to their higher recall rates of 80.6% and 78.8%. Compared to the baseline model trained with sample weights, the recall rate gained is roughly 56 percentage points using LASSO and 54 percentage points using MNN. Also, comparing LASSO and MNN trained with and without sample weights reveals that training models with sample weights increase the recall rate roughly by 11 percentage points for LASSO and 12 percentage points for MNN. Lastly, the paper provides a comprehensive and unified approach to better interpret the models using a game-theoretic approach – SHapley Additive exPlanations (SHAP) – to quantify feature importance. As a result, socio-economic characteristics of children, such as working in household farming and father's education level, are among the most important features contributing to the probability of school dropout in ML models. This study argues that the weighted sample structure of household data and its wide range of variables explored through the SHAP method for feature importance can enrich the literature and yield valuable results to harness data science for society.
Keywords: Machine learning; Feature importance; School dropout prediction; Sample weights; Educational data mining (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)
Downloads: (external link)
http://link.springer.com/10.1007/s42001-022-00195-3 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:jcsosc:v:6:y:2023:i:1:d:10.1007_s42001-022-00195-3
Ordering information: This journal article can be ordered from
http://www.springer. ... iences/journal/42001
DOI: 10.1007/s42001-022-00195-3
Access Statistics for this article
Journal of Computational Social Science is currently edited by Takashi Kamihigashi
More articles in Journal of Computational Social Science from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().