EconPapers    
Economics at your fingertips  
 

XGBoost-Based Framework for Smoking-Induced Noncommunicable Disease Prediction

Khishigsuren Davagdorj, Pham Van Huy, Nipon Theera-Umpon and Keun Ho Ryu
Additional contact information
Khishigsuren Davagdorj: Database and Bioinformatics Laboratory, College of Electrical and Computer Engineering, Chungbuk National University, Cheongju 28644, Korea
Pham Van Huy: Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh 700000, Vietnam
Nipon Theera-Umpon: Department of Electrical Engineering, Faculty of Engineering, Chiang Mai University, Chiang Mai 50200, Thailand
Keun Ho Ryu: Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh 700000, Vietnam

IJERPH, 2020, vol. 17, issue 18, 1-22

Abstract: Smoking-induced noncommunicable diseases (SiNCDs) have become a significant threat to public health and cause of death globally. In the last decade, numerous studies have been proposed using artificial intelligence techniques to predict the risk of developing SiNCDs. However, determining the most significant features and developing interpretable models are rather challenging in such systems. In this study, we propose an efficient extreme gradient boosting (XGBoost) based framework incorporated with the hybrid feature selection (HFS) method for SiNCDs prediction among the general population in South Korea and the United States. Initially, HFS is performed in three stages: (I) significant features are selected by t-test and chi-square test; (II) multicollinearity analysis serves to obtain dissimilar features; (III) final selection of best representative features is done based on least absolute shrinkage and selection operator (LASSO). Then, selected features are fed into the XGBoost predictive model. The experimental results show that our proposed model outperforms several existing baseline models. In addition, the proposed model also provides important features in order to enhance the interpretability of the SiNCDs prediction model. Consequently, the XGBoost based framework is expected to contribute for early diagnosis and prevention of the SiNCDs in public health concerns.

Keywords: smoking; noncommunicable disease; feature selection; extreme gradient boosting (search for similar items in EconPapers)
JEL-codes: I I1 I3 Q Q5 (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
https://www.mdpi.com/1660-4601/17/18/6513/pdf (application/pdf)
https://www.mdpi.com/1660-4601/17/18/6513/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jijerp:v:17:y:2020:i:18:p:6513-:d:410103

Access Statistics for this article

IJERPH is currently edited by Ms. Jenna Liu

More articles in IJERPH from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jijerp:v:17:y:2020:i:18:p:6513-:d:410103