EconPapers    
Economics at your fingertips  
 

Machine Learning Models for Data-Driven Prediction of Diabetes by Lifestyle Type

Yifan Qin, Jinlong Wu, Wen Xiao, Kun Wang, Anbing Huang, Bowen Liu, Jingxuan Yu, Chuhao Li, Fengyu Yu and Zhanbing Ren ()
Additional contact information
Yifan Qin: College of Physical Education, Shenzhen University, Shenzhen 518000, China
Jinlong Wu: College of Physical Education, Southwest University, Chongqing 400715, China
Wen Xiao: College of Physical Education, Shenzhen University, Shenzhen 518000, China
Kun Wang: Physical Education College, Yanching Institute of Technology, Langfang 065201, China
Anbing Huang: College of Physical Education, Shenzhen University, Shenzhen 518000, China
Bowen Liu: College of Physical Education, Shenzhen University, Shenzhen 518000, China
Jingxuan Yu: College of Physical Education, Shenzhen University, Shenzhen 518000, China
Chuhao Li: College of Physical Education, Shenzhen University, Shenzhen 518000, China
Fengyu Yu: College of Physical Education, Shenzhen University, Shenzhen 518000, China
Zhanbing Ren: College of Physical Education, Shenzhen University, Shenzhen 518000, China

IJERPH, 2022, vol. 19, issue 22, 1-16

Abstract: The prevalence of diabetes has been increasing in recent years, and previous research has found that machine-learning models are good diabetes prediction tools. The purpose of this study was to compare the efficacy of five different machine-learning models for diabetes prediction using lifestyle data from the National Health and Nutrition Examination Survey (NHANES) database. The 1999–2020 NHANES database yielded data on 17,833 individuals data based on demographic characteristics and lifestyle-related variables. To screen training data for machine models, the Akaike Information Criterion (AIC) forward propagation algorithm was utilized. For predicting diabetes, five machine-learning models (CATBoost, XGBoost, Random Forest (RF), Logistic Regression (LR), and Support Vector Machine (SVM)) were developed. Model performance was evaluated using accuracy, sensitivity, specificity, precision, F1 score, and receiver operating characteristic (ROC) curve. Among the five machine-learning models, the dietary intake levels of energy, carbohydrate, and fat, contributed the most to the prediction of diabetes patients. In terms of model performance, CATBoost ranks higher than RF, LG, XGBoost, and SVM. The best-performing machine-learning model among the five is CATBoost, which achieves an accuracy of 82.1% and an AUC of 0.83. Machine-learning models based on NHANES data can assist medical institutions in identifying diabetes patients.

Keywords: diabetes; machine learning; lifestyle; data-driven (search for similar items in EconPapers)
JEL-codes: I I1 I3 Q Q5 (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://www.mdpi.com/1660-4601/19/22/15027/pdf (application/pdf)
https://www.mdpi.com/1660-4601/19/22/15027/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jijerp:v:19:y:2022:i:22:p:15027-:d:973323

Access Statistics for this article

IJERPH is currently edited by Ms. Jenna Liu

More articles in IJERPH from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jijerp:v:19:y:2022:i:22:p:15027-:d:973323