Machine learning-based predictive modeling of angina pectoris in an elderly community-dwelling population: Results from the PoCOsteo study
Shahrokh Mousavi,
Zahrasadat Jalalian,
Sima Afrashteh,
Akram Farhadi,
Iraj Nabipour and
Bagher Larijani
PLOS ONE, 2025, vol. 20, issue 8, 1-22
Abstract:
Background: Angina pectoris, a comparatively common complaint among older adults, is a critical warning sign of underlying coronary heart disease. We aimed to develop machine learning-based models using multiple algorithms to predict and identify the predictors of angina pectoris in an elderly community-dwelling population. Methods: Medical records of 2000 participants in the PoCOsteo study between 2018 and 2021 were analyzed. The Rose Angina Questionnaire was used to indicate angina pectoris. Preprocessing was performed using imputation and scaling methods. We developed the following models: logistic regression (LR), multilayer perceptron (MLP), support vector machine (SVM), k-nearest neighbors (KNN), linear and quadratic discriminant analysis (LDA, QDA), decision tree (DT), and two ensemble models: random forest (RF) and adaptive boosting (AdaBoost). To address model complexity and parameter uncertainty, we performed hyperparameter tuning and compared the trade-offs between model performance and interpretability, in addition to applying ten-fold cross-validation. To determine the importance of each feature as a measure of their contribution to the models’ performance, we conducted the permutation feature importance technique. Results: With a mean age of 62.15 years (± 8.07) and 57.1% being female, 88.4% of the participants did not have angina, 3.6% had probable angina, and 8% had definite angina. The bivariate analysis revealed significant correlations between RAQ and several other variables. LDA, RF, and LR had the highest AUC values, averaging 0.772, 0.770, and 0.764, respectively. These three models outperformed QDA (AUC 0.752), SVM (0.733), AdaBoost (0.726), KNN (0.697), MLP (0.697), and DT (0.644). Permutation feature importance revealed a handful of features that implicated the role of thrombotic vascular diseases, congestive heart failure, renal failure, and anemia. Discussion: Our study demonstrated that LDA, RF, and LR not only provided strong predictive performance but also balanced model complexity with interpretability. The superior performance of these models could be largely attributed to their ability to capture the relevant linear, nonlinear, and interaction effects inherent in the clinical data, as well as the clinical relevance of key predictors like thrombotic vascular diseases, congestive heart failure, renal failure, and anemia. Future studies could incorporate more direct diagnostic methods to test our findings further and enhance the robustness of the predictive models developed.
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0329023 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 29023&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0329023
DOI: 10.1371/journal.pone.0329023
Access Statistics for this article
More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().