EconPapers    
Economics at your fingertips  
 

Enhanced machine learning and hybrid ensemble approaches for Coronary Heart Disease prediction

Maurice Wanyonyi, Zakayo Ndiku Morris, Faith Mueni Musyoka and Dominic Makaa Kitavi

PLOS ONE, 2025, vol. 20, issue 12, 1-51

Abstract: Coronary heart disease (CHD) remains the leading cause of mortality worldwide, disproportionately affecting low- and middle-income countries where diagnostic resources are limited. Traditional statistical models often fail to deliver adequate predictive accuracy in complex, high-dimensional, and imbalanced health datasets. To develop and evaluate enhanced machine learning and hybrid ensemble models for the prediction of coronary heart disease, with a focus on improving diagnostic performance, interpretability, and applicability in resource-constrained settings. We utilized a nationally representative dataset of 253,680 individuals from the Behavioral Risk Factor Surveillance System. Preprocessing included normalization and balancing via the Synthetic Minority Oversampling Technique (SMOTE). Baseline models—Decision Trees, Random Forests, Gradient Boosting, and Support Vector Machines—were compared against improved versions: Adaptive Noise–Resistant Decision Tree (ADNRT), Hybrid Imbalanced Random Forest (HIRF), Pruned Gradient Boosting Machine (PGBM), and Enhanced Support Vector Machine (ESVM). Ensemble approaches (stacking, boosting, bagging, Bayesian model averaging and majority voting) were implemented and evaluated using accuracy, sensitivity, specificity, and area under the curve (AUC). Calibration and learning curves were also analyzed. Enhanced models consistently outperformed their baseline counterparts. PGBM achieved the highest sensitivity (90.8%), while HIRF demonstrated the best overall calibration and balance (AUC = 0.937; sensitivity = 88.4%; specificity = 82.9%). The stacking ensemble emerged as the best-performing model with an accuracy of 87.2%, sensitivity of 89.6%, specificity of 84.7%, and AUC of 0.94. Calibration and learning curve analyses confirmed strong generalizability and low overfitting across ensemble models. Hybrid ensemble machine learning models significantly outperform traditional classifiers in CHD prediction, offering high accuracy, robustness, and interpretability. These models present a scalable framework for implementing AI-driven diagnostic tools in low–resource environments, potentially transforming early detection and prevention of coronary heart disease.

Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0328338 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 28338&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0328338

DOI: 10.1371/journal.pone.0328338

Access Statistics for this article

More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().

 
Page updated 2025-12-28
Handle: RePEc:plo:pone00:0328338