EconPapers    
Economics at your fingertips  
 

Applying machine learning to predict stunting in children under 5 years old based on water, sanitation and hygiene behaviors and infrastructure

Sanaya Sinharoy, Heather Reese, Thomas Clasen and Sheela S Sinharoy

PLOS ONE, 2026, vol. 21, issue 3, 1-19

Abstract: Objective: Child stunting continues to pose a substantial global health challenge, requiring multifaceted strategies that combine conventional epidemiological approaches with advanced analytic methods. The aim of this study was to determine the most effective machine learning model for predicting stunting based on water, sanitation, and hygiene behaviors and infrastructure, with the goal of identifying high-risk children who would benefit most from targeted interventions. Methods: This study was a secondary analysis of data from a matched cohort study assessing the effectiveness of combined on-premise piped water and improved sanitation for improved health outcomes in rural Odisha, India. Data for the parent study were collected from 2,398 households with a child under five years of age across 90 villages, and complete data were available for 1,196 children. Feature engineering techniques were employed to identify the most relevant predictors and utilized structural equation modeling, forward selection, backward elimination, and least absolute shrinkage and selection operator techniques. Five machine learning algorithms commonly used for binary classification tasks were compared: logistic regression, classification tree, support vector machine, neural network, and extreme gradient boosting. Results: Among 1,196 children analyzed, the extreme gradient boosting model with forward selection feature engineering best predicted stunting based on water, sanitation, and hygiene (WaSH) factors. It correctly identified 81% of stunted children and 92% of non-stunted children, with an overall accuracy of 88%. The model’s area under the receiver operating characteristic curve (AUROC) was 0.959 (95% CI: 0.949–0.968), indicating that WaSH factors strongly predict child stunting when analyzed using this advanced machine learning technique. Four WaSH factors were identified as having the strongest power to predict stunting in our sample: improved sanitation coverage, presence of a handwashing station, piped water coverage, and availability of preferred drinking water source. Conclusions: The results demonstrate the efficacy of machine learning algorithms, especially extreme gradient boosting to potentially inform targeted WaSH interventions for reducing childhood stunting in resource-limited settings. However, these findings require external validation in other populations, and the complete-case analysis approach (excluding 35% of children with missing data) may limit generalizability to settings with less systematic data collection.

Date: 2026
References: Add references at CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0343796 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 43796&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0343796

DOI: 10.1371/journal.pone.0343796

Access Statistics for this article

More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().

 
Page updated 2026-03-08
Handle: RePEc:plo:pone00:0343796