EconPapers    
Economics at your fingertips  
 

Predicting Hepatitis B Virus Infection Based on Health Examination Data of Community Population

Ying Wang, Zhicheng Du, Wayne R. Lawrence, Yun Huang, Yu Deng and Yuantao Hao
Additional contact information
Ying Wang: Department of Medical Statistics and Epidemiology, School of Public Health, Sun Yat-sen University, Guangzhou 510080, China
Zhicheng Du: Department of Medical Statistics and Epidemiology, School of Public Health, Sun Yat-sen University, Guangzhou 510080, China
Wayne R. Lawrence: Department of Epidemiology and Biostatistics, School of Public Health, University at Albany, State University of New York, Rensselaer, New York, NY 12144, USA
Yun Huang: Department of Medical Statistics and Epidemiology, School of Public Health, Sun Yat-sen University, Guangzhou 510080, China
Yu Deng: Department of Medical Statistics and Epidemiology, School of Public Health, Sun Yat-sen University, Guangzhou 510080, China
Yuantao Hao: Department of Medical Statistics and Epidemiology, School of Public Health, Sun Yat-sen University, Guangzhou 510080, China

IJERPH, 2019, vol. 16, issue 23, 1-13

Abstract: Despite a decline in the prevalence of hepatitis B in China, the disease burden remains high. Large populations unaware of infection risk often fail to meet the ideal treatment window, resulting in poor prognosis. The purpose of this study was to develop and evaluate models identifying high-risk populations who should be tested for hepatitis B surface antigen. Data came from a large community-based health screening, including 97,173 individuals, with an average age of 54.94. A total of 33 indicators were collected as model predictors, including demographic characteristics, routine blood indicators, and liver function. Borderline-Synthetic minority oversampling technique (SMOTE) was conducted to preprocess the data and then four predictive models, namely, the extreme gradient boosting (XGBoost), random forest (RF), decision tree (DT), and logistic regression (LR) algorithms, were developed. The positive rate of hepatitis B surface antigen (HBsAg) was 8.27%. The area under the receiver operating characteristic curves for XGBoost, RF, DT, and LR models were 0.779, 0.752, 0.619, and 0.742, respectively. The Borderline-SMOTE XGBoost combined model outperformed the other models, which correctly predicted 13,637/19,435 cases (sensitivity 70.8%, specificity 70.1%), and the variable importance plot of XGBoost model indicated that age was of high importance. The prediction model can be used to accurately identify populations at high risk of hepatitis B infection that should adopt timely appropriate medical treatment measures.

Keywords: hepatitis B virus; machine learning; prediction (search for similar items in EconPapers)
JEL-codes: I I1 I3 Q Q5 (search for similar items in EconPapers)
Date: 2019
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/1660-4601/16/23/4842/pdf (application/pdf)
https://www.mdpi.com/1660-4601/16/23/4842/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jijerp:v:16:y:2019:i:23:p:4842-:d:293125

Access Statistics for this article

IJERPH is currently edited by Ms. Jenna Liu

More articles in IJERPH from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jijerp:v:16:y:2019:i:23:p:4842-:d:293125