Machine learning-based prediction of metabolic dysfunction-associated steatotic liver disease using National Health and Nutrition Examination Survey (NHANES) data
Yong Zhang,
Xiang Liu,
Xingqiang Zhang,
Yangfan Fei and
Xiaoxu Li
PLOS ONE, 2025, vol. 20, issue 11, 1-13
Abstract:
Objective: With the global increase in obesity rates and lifestyle changes, metabolic dysfunction-associated steatotic liver disease (MASLD) has become a prevalent chronic liver disorder, affecting approximately 25% of the global population. This disease can progress to cirrhosis and liver cancer, posing a significant threat to public health. To facilitate early diagnosis and intervention, this study aims to develop an efficient and reliable prediction model for MASLD using machine learning algorithm. Methods: This study included 9,232 participants aged 20 years and older from the 2017–2020 National Health and Nutrition Examination Survey (NHANES). After excluding individuals with frequent alcohol consumption, hepatitis B/C infection, those lacking liver ultrasound examinations, and samples with missing data, a total of 2,460 subjects were ultimately included. The dataset was split into training and testing sets in an 80:20 ratio. Five machine learning algorithms—XGBoost, Random Forest (RF), and Logistic Regression (LR), among others—were utilized to build prediction models, while Recursive Feature Elimination (RFE) was employed to identify key predictive factors. Results: Comparison of the five algorithms revealed that the XGBoost algorithm performed the best. Twelve key features were selected through Recursive Feature Elimination (RFE), and the model achieved an AUC of 0.8740 on the testing set, demonstrating excellent predictive accuracy and discriminative ability. SHAP plot analysis of the model showed that waist circumference, BMI, and other factors played a pivotal role in the prediction of MASLD. Conclusion: The prediction model developed using the XGBoost algorithm and the 12 selected features demonstrates high efficiency and stability in assessing MASLD risk. This model offers innovative technical solutions and data-driven support for the clinical early identification of high-risk populations, with the potential to optimize and refine MASLD prevention and control strategies.
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0335656 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 35656&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0335656
DOI: 10.1371/journal.pone.0335656
Access Statistics for this article
More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().