A DIABETES RISK PREDICTING METHOD WITH MULTI-STRATEGY COUNTERFACTUAL-BASED DATA AUGMENTATION

Wang, Chen; Liu, Yan-Yi; Diao, Zhao-Shuo; Tang, Jia-Wei; Wen, Ying-You; Yang, Xiao-Tao

A DIABETES RISK PREDICTING METHOD WITH MULTI-STRATEGY COUNTERFACTUAL-BASED DATA AUGMENTATION

Chen Wang, Yan-Yi Liu, Zhao-Shuo Diao, Jia-Wei Tang, Ying-You Wen and Xiao-Tao Yang
Additional contact information
Chen Wang: School of Computer Science and Engineering, Northeastern University, Shenyang 110004, P. R. Chinaâ€ Neusoft Institute of Intelligent Medical Research, Shenyang 110179, P. R. China
Yan-Yi Liu: School of Computer Science and Engineering, Northeastern University, Shenyang 110004, P. R. China
Zhao-Shuo Diao: School of Computer Science and Engineering, Northeastern University, Shenyang 110004, P. R. China
Jia-Wei Tang: School of Computer Science and Engineering, Northeastern University, Shenyang 110004, P. R. China
Ying-You Wen: School of Computer Science and Engineering, Northeastern University, Shenyang 110004, P. R. Chinaâ€ Neusoft Institute of Intelligent Medical Research, Shenyang 110179, P. R. China
Xiao-Tao Yang: ï¿½ï¿½The First Affiliated Hospital of China Medical University, Shenyang 110001, P. R. China

FRACTALS (fractals), 2023, vol. 31, issue 06, 1-17

Abstract: Diabetes is a chronic disease that poses a serious threat to health, and its early risk prediction has been a hot research topic in the field of medical artificial intelligence. Routine medical checkups are the most common way to monitor peopleâ€™s health status, and the data from medical checkups contain rich diagnostic information, which is valuable for diabetes risk prediction. Currently, most of the available studies on diabetes risk prediction are based on publicly available datasets, and the models and algorithms do not work well on real clinical datasets. Real routine checkup data are characterized by complex information, diverse features, high redundancy and poor balance, which pose great challenges for diabetes risk prediction. To address this problem, this paper proposes a multi-strategy data augmentation-based diabetes risk prediction method, after completing data pre-processing and feature selection, a counterfactual-based data balancing strategy is used to augment a minority class of instances, and a density clustering-based supplemental counterfactual data augmentation strategy is proposed to address the problem of insufficient representation of generated instances in the counterfactual method. Moreover, the uncertainty-weighted method is used in the model training phase. Based on the real checkup dataset, five machine learning methods including Logistic Regression (LR), SVM, Decision Tree, Random Forest and Gradient Boosting are used to model and use 5-fold cross-validation to carry out diabetes risk assessment and prediction. The experimental results showed that the sensitivity and precision of the models were significantly improved compared with the existing methods, and the sensitivity of the LR model for diabetes risk prediction on the real routine checkup dataset reached more than 90%, which meet the requirements of clinical application.

Keywords: Machine Learning; Counterfactual; Data Augmentation; Risk Prediction; Uncertainty (search for similar items in EconPapers)
Date: 2023
References: Add references at CitEc
Citations:

Downloads: (external link)
http://www.worldscientific.com/doi/abs/10.1142/S0218348X23401060
Access to full text is restricted to subscribers

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:wsi:fracta:v:31:y:2023:i:06:n:s0218348x23401060

Ordering information: This journal article can be ordered from

DOI: 10.1142/S0218348X23401060

Access Statistics for this article

FRACTALS (fractals) is currently edited by Tara Taylor

More articles in FRACTALS (fractals) from World Scientific Publishing Co. Pte. Ltd.
Bibliographic data for series maintained by Tai Tone Lim ().