Iterative Variable Selection for High-Dimensional Data: Prediction of Pathological Response in Triple-Negative Breast Cancer
Juan C. Laria,
M. Carmen Aguilera-Morillo,
Enrique Álvarez,
Rosa E. Lillo,
Sara López-Taruella,
María del Monte-Millán,
Antonio C. Picornell,
Miguel Martín and
Juan Romo
Additional contact information
Juan C. Laria: UC3M-BS Santander Big Data Institute, 28903 Getafe, Spain
M. Carmen Aguilera-Morillo: UC3M-BS Santander Big Data Institute, 28903 Getafe, Spain
Enrique Álvarez: Department of Medical Oncology, Hospital General Universitario Gregorio Marañón, Instituto de Investigación Sanitaria Gregorio Marañón, 28007 Madrid, Spain
Rosa E. Lillo: UC3M-BS Santander Big Data Institute, 28903 Getafe, Spain
Sara López-Taruella: Department of Medical Oncology, Hospital General Universitario Gregorio Marañón, Instituto de Investigación Sanitaria Gregorio Marañón, 28007 Madrid, Spain
María del Monte-Millán: Department of Medical Oncology, Hospital General Universitario Gregorio Marañón, Instituto de Investigación Sanitaria Gregorio Marañón, 28007 Madrid, Spain
Antonio C. Picornell: Department of Medical Oncology, Hospital General Universitario Gregorio Marañón, Instituto de Investigación Sanitaria Gregorio Marañón, 28007 Madrid, Spain
Miguel Martín: Department of Medical Oncology, Hospital General Universitario Gregorio Marañón, Instituto de Investigación Sanitaria Gregorio Marañón, 28007 Madrid, Spain
Juan Romo: UC3M-BS Santander Big Data Institute, 28903 Getafe, Spain
Mathematics, 2021, vol. 9, issue 3, 1-14
Abstract:
Over the last decade, regularized regression methods have offered alternatives for performing multi-marker analysis and feature selection in a whole genome context. The process of defining a list of genes that will characterize an expression profile remains unclear. It currently relies upon advanced statistics and can use an agnostic point of view or include some a priori knowledge, but overfitting remains a problem. This paper introduces a methodology to deal with the variable selection and model estimation problems in the high-dimensional set-up, which can be particularly useful in the whole genome context. Results are validated using simulated data and a real dataset from a triple-negative breast cancer study.
Keywords: variable selection; high dimension; regularization; classification; sparse-group lasso (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/9/3/222/pdf (application/pdf)
https://www.mdpi.com/2227-7390/9/3/222/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:9:y:2021:i:3:p:222-:d:485789
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().