Hierarchical Feature Selection Incorporating Known and Novel Biological Information: Identifying Genomic Features Related to Prostate Cancer Recurrence
Yize Zhao,
Matthias Chung,
Brent A. Johnson,
Carlos S. Moreno and
Qi Long
Journal of the American Statistical Association, 2016, vol. 111, issue 516, 1427-1439
Abstract:
Our work is motivated by a prostate cancer study aimed at identifying mRNA and miRNA biomarkers that are predictive of cancer recurrence after prostatectomy. It has been shown in the literature that incorporating known biological information on pathway memberships and interactions among biomarkers improves feature selection of high-dimensional biomarkers in relation to disease risk. Biological information is often represented by graphs or networks, in which biomarkers are represented by nodes and interactions among them are represented by edges; however, biological information is often not fully known. For example, the role of microRNAs (miRNAs) in regulating gene expression is not fully understood and the miRNA regulatory network is not fully established, in which case new strategies are needed for feature selection. To this end, we treat unknown biological information as missing data (i.e., missing edges in graphs), different from commonly encountered missing data problems where variable values are missing. We propose a new concept of imputing unknown biological information based on observed data and define the imputed information as the novel biological information. In addition, we propose a hierarchical group penalty to encourage sparsity and feature selection at both the pathway level and the within-pathway level, which, combined with the imputation step, allows for incorporation of known and novel biological information. While it is applicable to general regression settings, we develop and investigate the proposed approach in the context of semiparametric accelerated failure time models motivated by our data example. Data application and simulation studies show that incorporation of novel biological information improves performance in risk prediction and feature selection and the proposed penalty outperforms the extensions of several existing penalties. Supplementary materials for this article are available online.
Date: 2016
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
http://hdl.handle.net/10.1080/01621459.2016.1164051 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:taf:jnlasa:v:111:y:2016:i:516:p:1427-1439
Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/UASA20
DOI: 10.1080/01621459.2016.1164051
Access Statistics for this article
Journal of the American Statistical Association is currently edited by Xuming He, Jun Liu, Joseph Ibrahim and Alyson Wilson
More articles in Journal of the American Statistical Association from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().