EconPapers    
Economics at your fingertips  
 

Comparison of Variable Selection Methods for Time-to-Event Data in High-Dimensional Settings

Julia Gilhodes, Florence Dalenc, Jocelyn Gal, Christophe Zemmour, Eve Leconte, Jean Marie Boher and Thomas Filleron ()
Additional contact information
Julia Gilhodes: ICR - Institut Claudius Regaud
Florence Dalenc: ICR - Institut Claudius Regaud
Jocelyn Gal: UNICANCER/CAL - Centre de Lutte contre le Cancer Antoine Lacassagne [Nice] - UNICANCER - UniCA - Université Côte d'Azur
Christophe Zemmour: IPC - Institut Paoli-Calmettes - Fédération nationale des Centres de lutte contre le Cancer (FNCLCC), SESSTIM - U1252 INSERM - Aix Marseille Univ - UMR 259 IRD - Sciences Economiques et Sociales de la Santé & Traitement de l'Information Médicale - IRD - Institut de Recherche pour le Développement - AMU - Aix Marseille Université - INSERM - Institut National de la Santé et de la Recherche Médicale
Eve Leconte: TSE-R - Toulouse School of Economics - UT Capitole - Université Toulouse Capitole - UT - Université de Toulouse - EHESS - École des hautes études en sciences sociales - CNRS - Centre National de la Recherche Scientifique - INRAE - Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement
Jean Marie Boher: IPC - Institut Paoli-Calmettes - Fédération nationale des Centres de lutte contre le Cancer (FNCLCC), SESSTIM - U1252 INSERM - Aix Marseille Univ - UMR 259 IRD - Sciences Economiques et Sociales de la Santé & Traitement de l'Information Médicale - IRD - Institut de Recherche pour le Développement - AMU - Aix Marseille Université - INSERM - Institut National de la Santé et de la Recherche Médicale
Thomas Filleron: ICR - Institut Claudius Regaud

Post-Print from HAL

Abstract: Over the last decades, molecular signatures have become increasingly important in oncology and are opening up a new area of personalized medicine. Nevertheless, biological relevance and statistical tools necessary for the development of these signatures have been called into question in the literature. Here, we investigate six typical selection methods for high-dimensional settings and survival endpoints, including LASSO and some of its extensions, component-wise boosting, and random survival forests (RSF). A resampling algorithm based on data splitting was used on nine high-dimensional simulated datasets to assess selection stability on training sets and the intersection between selection methods. Prognostic performances were evaluated on respective validation sets. Finally, one application on a real breast cancer dataset has been proposed. The false discovery rate (FDR) was high for each selection method, and the intersection between lists of predictors was very poor. RSF selects many more variables than the other methods and thus becomes less efficient on validation sets. Due to the complex correlation structure in genomic data, stability in the selection procedure is generally poor for selected predictors, but can be improved with a higher training sample size. In a very high-dimensional setting, we recommend the LASSO-pcvl method since it outperforms other methods by reducing the number of selected genes and minimizing FDR in most scenarios. Nevertheless, this method still gives a high rate of false positives. Further work is thus necessary to propose new methods to overcome this issue where numerous predictors are present. Pluridisciplinary discussion between clinicians and statisticians is necessary to ensure both statistical and biological relevance of the predictors included in molecular signatures.

Date: 2020-07
New Economics Papers: this item is included in nep-big, nep-ecm and nep-ore
Note: View the original document on HAL open archive server: https://hal.science/hal-02934793v1
References: View references in EconPapers View complete reference list from CitEc
Citations:

Published in Computational and Mathematical Methods in Medicine, 2020, 2020, pp.6795392. ⟨10.1155/2020/6795392⟩

Downloads: (external link)
https://hal.science/hal-02934793v1/document (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:hal:journl:hal-02934793

DOI: 10.1155/2020/6795392

Access Statistics for this paper

More papers in Post-Print from HAL
Bibliographic data for series maintained by CCSD ().

 
Page updated 2025-03-19
Handle: RePEc:hal:journl:hal-02934793