Study of Integrated Heterogeneous Data Reveals Prognostic Power of Gene Expression for Breast Cancer Survival
Richard E Neapolitan and
Xia Jiang
PLOS ONE, 2015, vol. 10, issue 2, 1-16
Abstract:
Background: Studies show that thousands of genes are associated with prognosis of breast cancer. Towards utilizing available genetic data, efforts have been made to predict outcomes using gene expression data, and a number of commercial products have been developed. These products have the following shortcomings: 1) They use the Cox model for prediction. However, the RSF model has been shown to significantly outperform the Cox model. 2) Testing was not done to see if a complete set of clinical predictors could predict as well as the gene expression signatures. Methodology/Findings: We address these shortcomings. The METABRIC data set concerns 1981 breast cancer tumors. Features include 21 clinical features, expression levels for 16,384 genes, and survival. We compare the survival prediction performance of the Cox model and the RSF model using the clinical data and the gene expression data to their performance using only the clinical data. We obtain significantly better results when we used both clinical data and gene expression data for 5 year, 10 year, and 15 year survival prediction. When we replace the gene expression data by PAM50 subtype, our results are significant only for 5 year and 15 year prediction. We obtain significantly better results using the RSF model over the Cox model. Finally, our results indicate that gene expression data alone may predict long-term survival. Conclusions/Significance: Our results indicate that we can obtain improved survival prediction using clinical data and gene expression data compared to prediction using only clinical data. We further conclude that we can obtain improved survival prediction using the RSF model instead of the Cox model. These results are significant because by incorporating more gene expression data with clinical features and using the RSF model, we could develop decision support systems that better utilize heterogeneous information to improve outcome prediction and decision making.
Date: 2015
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0117658 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 17658&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0117658
DOI: 10.1371/journal.pone.0117658
Access Statistics for this article
More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().