Data-driven survival modeling for breast cancer prognostics: A comparative study with machine learning and traditional survival modeling methods

Baidoo, Theophilus Gyedu; Rodrigo, Hansapani

Data-driven survival modeling for breast cancer prognostics: A comparative study with machine learning and traditional survival modeling methods

Theophilus Gyedu Baidoo and Hansapani Rodrigo

PLOS ONE, 2025, vol. 20, issue 4, 1-18

Abstract: Background This investigation delves into the potential application of data-driven survival modeling approaches for prognostic assessments of breast cancer survival. The primary objective is to evaluate and compare the ability of machine learning (ML) models and conventional survival analysis techniques, to identify consistent key predictors of breast cancer survival outcomes.Methods This study employs data-driven survival modeling approaches to predict breast cancer survival, including survival-specific methods such as the Cox Proportional Hazards (CPH) model, Random Survival Forests (RSF), and Cox Proportional Deep Neural Networks (DeepSurv), as well as machine learning models like Random Forests (RF), XGBoost, Support Vector Machines (SVM) with an RBF Kernel, and LightGBM. The dataset, sourced from the National Cancer Institute’s Surveillance, Epidemiology, and End Results (SEER) program, comprises 4,024 women diagnosed with infiltrating duct and lobular carcinoma breast cancer between 2006 and 2010. To ensure interpretability across all models, the Shapley Additive Explanation (SHAP) method was applied to RSF, DeepSurv, Random Forests (RF), and XGBoost. This enabled the identification of key predictors influencing breast cancer survival, highlighting consistent factors across models while uncovering unique insights specific to each approach.Results The performance of survival-specific and ML models were evaluated using the Concordance index (C-index), Integrated Brier Score (IBS), mean accuracy, and mean AUC. The CPH model achieved a C-index of 0 . 71 ± 0 . 015 and an IBS of 0 . 08 ± 0 . 006, while RSF demonstrated slightly better discriminatory power with a C-index of 0 . 72 ± 0 . 0117. DeepSurv performed comparably, with a C-index of 0 . 71 ± 0 . 0095 and an IBS of 0 . 09 ± 0 . 0008. Both Cox and RSF models achieved the lowest IBS (0 . 08), indicating accurate survival probability predictions over time. For ML models, RF achieved a mean AUC of 0 . 74 ± 0 . 0021, and XGBoost with a mean AUC 0 . 69 ± 0 . 0183, reflecting fair discriminatory ability but not accounting for censoring in survival data. SHAP analysis for the top-performing models highlighted the extent of lymph node involvement, Regional Node-Positive (number of affected lymph nodes), tumor grade (cell abnormality and growth rate), progesterone status, and age as key predictors of breast cancer survival outcomes.Conclusions While ML models like XGBoost and RF can effectively identify important predictors and patterns in breast cancer outcomes, survival-specific methods such as the Cox model, RSF, and DeepSurv provide essential capabilities for handling time-to-event data and censoring, making them more suitable for accurate survival predictions. The primary objective of including ML models in this analysis was to leverage their interpretability in identifying key variables alongside survival-specific models, rather than to directly compare their performance against survival models. By examining both ML and survival models, this research highlights the complementary strengths of each approach. This study contributes to the integration of artificial intelligence in healthcare, emphasizing the value of data-driven survival modeling techniques in supporting healthcare professionals with accurate, personalized, and actionable insights for high-risk patients. Together, these approaches enhance the precision of survival predictions, paving the way for more informed clinical decision-making and improved patient care.

Date: 2025
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0318167 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 18167&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0318167

DOI: 10.1371/journal.pone.0318167

Access Statistics for this article

More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().