Improving Breast Cancer Survival Analysis through Competition-Based Multidimensional Modeling
Erhan Bilal,
Janusz Dutkowski,
Justin Guinney,
In Sock Jang,
Benjamin A Logsdon,
Gaurav Pandey,
Benjamin A Sauerwine,
Yishai Shimoni,
Hans Kristian Moen Vollan,
Brigham H Mecham,
Oscar M Rueda,
Jorg Tost,
Christina Curtis,
Mariano J Alvarez,
Vessela N Kristensen,
Samuel Aparicio,
Anne-Lise Børresen-Dale,
Carlos Caldas,
Andrea Califano,
Stephen H Friend,
Trey Ideker,
Eric E Schadt,
Gustavo A Stolovitzky and
Adam A Margolin
PLOS Computational Biology, 2013, vol. 9, issue 5, 1-16
Abstract:
Breast cancer is the most common malignancy in women and is responsible for hundreds of thousands of deaths annually. As with most cancers, it is a heterogeneous disease and different breast cancer subtypes are treated differently. Understanding the difference in prognosis for breast cancer based on its molecular and phenotypic features is one avenue for improving treatment by matching the proper treatment with molecular subtypes of the disease. In this work, we employed a competition-based approach to modeling breast cancer prognosis using large datasets containing genomic and clinical information and an online real-time leaderboard program used to speed feedback to the modeling team and to encourage each modeler to work towards achieving a higher ranked submission. We find that machine learning methods combined with molecular features selected based on expert prior knowledge can improve survival predictions compared to current best-in-class methodologies and that ensemble models trained across multiple user submissions systematically outperform individual models within the ensemble. We also find that model scores are highly consistent across multiple independent evaluations. This study serves as the pilot phase of a much larger competition open to the whole research community, with the goal of understanding general strategies for model optimization using clinical and molecular profiling data and providing an objective, transparent system for assessing prognostic models.Author Summary: We developed an extensible software framework for sharing molecular prognostic models of breast cancer survival in a transparent collaborative environment and subjecting each model to automated evaluation using objective metrics. The computational framework presented in this study, our detailed post-hoc analysis of hundreds of modeling approaches, and the use of a novel cutting-edge data resource together represents one of the largest-scale systematic studies to date assessing the factors influencing accuracy of molecular-based prognostic models in breast cancer. Our results demonstrate the ability to infer prognostic models with accuracy on par or greater than previously reported studies, with significant performance improvements by using state-of-the-art machine learning approaches trained on clinical covariates. Our results also demonstrate the difficultly in incorporating molecular data to achieve substantial performance improvements over clinical covariates alone. However, improvement was achieved by combining clinical feature data with intelligent selection of important molecular features based on domain-specific prior knowledge. We observe that ensemble models aggregating the information across many diverse models achieve among the highest scores of all models and systematically out-perform individual models within the ensemble, suggesting a general strategy for leveraging the wisdom of crowds to develop robust predictive models.
Date: 2013
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003047 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 03047&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1003047
DOI: 10.1371/journal.pcbi.1003047
Access Statistics for this article
More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().