EconPapers    
Economics at your fingertips  
 

Feature Selection and Machine Learning Approaches in Prediction of Current E-Cigarette Use Among U.S. Adults in 2022

Wei Fang, Ying Liu, Chun Xu, Xingguang Luo and Kesheng Wang ()
Additional contact information
Wei Fang: West Virginia Clinical and Translational Science Institute, Morgantown, WV 26506, USA
Ying Liu: Department of Biostatistics and Epidemiology, College of Public Health, East Tennessee State University, Johnson City, TN 37614, USA
Chun Xu: Department of Health and Biomedical Sciences, College of Health Professions, University of Texas Rio Grande Valley, Brownsville, TX 78520, USA
Xingguang Luo: Department of Psychiatry, Yale University School of Medicine, New Haven, CT 06516, USA
Kesheng Wang: Department of Biobehavioral Health & Nursing Science, College of Nursing, University of South Carolina, Columbia, SC 29208, USA

IJERPH, 2024, vol. 21, issue 11, 1-14

Abstract: Feature selection is essentially the process of picking informative and relevant features from a larger collection of features. Few studies have focused on predictors for current e-cigarette use among U.S. adults using feature selection and machine learning (ML) approaches. This study aimed to perform feature selection and develop ML approaches in prediction of current e-cigarette use using the 2022 Health Information National Trends Survey (HINTS 6). The Boruta algorithm and the least absolute shrinkage and selection operator (LASSO) were used to perform feature selection of 71 variables. The random oversampling example (ROSE) method was utilized to deal with imbalance data. Five ML tools including support vector machines (SVMs), logistic regression (LR), random forest (RF), gradient boosting machine (GBM), and extreme gradient boosting (XGBoost) were applied to develop ML models. The overall prevalence of current e-cigarette use was 4.3%. Using the overlapped 15 variables selected by Boruta and LASSO, the RF algorithm provided the best classifier with an accuracy of 0.992, sensitivity of 0.985, F1 score of 0.991, and AUC of 0.999. Weighted logistic regression further confirmed that age, education level, smoking status, belief in the harm of e-cigarette use, binge drinking, belief in alcohol increasing cancer, and the Patient Health Questionnaire-4 (PHQ4) score were associated with e-cigarette use. This study confirmed the strength of ML techniques in survey data, and the findings will guide inquiry into behaviors and mentalities of substance users.

Keywords: e-cigarette use; beliefs; binge drinking; PHQ4; feature selection; machine learning (search for similar items in EconPapers)
JEL-codes: I I1 I3 Q Q5 (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/1660-4601/21/11/1474/pdf (application/pdf)
https://www.mdpi.com/1660-4601/21/11/1474/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jijerp:v:21:y:2024:i:11:p:1474-:d:1514993

Access Statistics for this article

IJERPH is currently edited by Ms. Jenna Liu

More articles in IJERPH from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jijerp:v:21:y:2024:i:11:p:1474-:d:1514993