EconPapers    
Economics at your fingertips  
 

Leveraging machine learning algorithm to predict minimum dietary diversity among children aged 6–23 months in Ethiopia

Naol Gonfa Serbessa, Siraj Muhidin Degefa, Beriso Alemu Hailu, Geleta Nenko Dube, Betelhem Bizuneh Asfawu, Asmamaw Ketemaw Tsehay, Eskedar Ayehu, Mulusew Andualem Asemahegn, Agmasie Damtew Wale, Eden Ketema Woldekidan, Tigist Tolessa Sedi, Asmamaw Deneke, Zehara Jemal Nuriye, Mohammedjud Hassen Ahmed and Habtamu Alganeh Guadie

PLOS Global Public Health, 2026, vol. 6, issue 2, 1-31

Abstract: Lack of nutrient-rich food consumption is considered an important underlying factor affecting the healthy development of children, and can lead to developmental delays and various disorders. There is limited evidence on the predicators of dietary diversity. We aimed to train and test eight machine learning algorithms in the Ethiopian demographic and health survey (EDHS) from 2005–2019. We used secondary data from EDHS 2005, 2011, 2016 and 2019. A total of 8,996 weighted samples of children aged 6–23 months were included in the study. STATA 17 was used to extract variables from the EDHS dataset. Python 3.11 software was used for data cleaning, coding, and further analysis. The machine learning algorithms used in this study were logistic regression, random forest, K nearest neighbor (KNN), multilayer perceptron (MLP), support vector machine, naive Bayes, extreme gradient Boost (XGBoost), and AdaBoost. Furthermore, Shapley additive explanation’s (SHAPs) were used for model interpretability and to identify top predictors. The random forest classifier (accuracy = 82%, recall = 84.9%, precision = 78.5%, F1-score = 81.7%, area under the curve: AUC = 89%) was the best model for predicting minimum dietary diversity among children aged 6–23 month. Minimum Dietary Diversity is still a significant public health issue in Ethiopia, and there are important inequalities in regional and socioeconomic factors. The random forest model performed better for prediction and found place of delivery, sex of the household head, water source, place of residence, age of the child, number of children under five years of age, women’s years of age, and household size as the most important predictors. The result shows the importance of the use of machine learning in detecting the most-at-risk population and informing specific nutrition interventions.

Date: 2026
References: Add references at CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/globalpublichealth/artic ... journal.pgph.0005995 (text/html)
https://journals.plos.org/globalpublichealth/artic ... 05995&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pgph00:0005995

DOI: 10.1371/journal.pgph.0005995

Access Statistics for this article

More articles in PLOS Global Public Health from Public Library of Science
Bibliographic data for series maintained by globalpubhealth ().

 
Page updated 2026-03-08
Handle: RePEc:plo:pgph00:0005995