EconPapers    
Economics at your fingertips  
 

Iterative Variable Selection for High-Dimensional Data: Prediction of Pathological Response in Triple-Negative Breast Cancer

Juan C. Laria, M. Carmen Aguilera-Morillo, Enrique Álvarez, Rosa E. Lillo, Sara López-Taruella, María del Monte-Millán, Antonio C. Picornell, Miguel Martín and Juan Romo
Additional contact information
Juan C. Laria: UC3M-BS Santander Big Data Institute, 28903 Getafe, Spain
M. Carmen Aguilera-Morillo: UC3M-BS Santander Big Data Institute, 28903 Getafe, Spain
Enrique Álvarez: Department of Medical Oncology, Hospital General Universitario Gregorio Marañón, Instituto de Investigación Sanitaria Gregorio Marañón, 28007 Madrid, Spain
Rosa E. Lillo: UC3M-BS Santander Big Data Institute, 28903 Getafe, Spain
Sara López-Taruella: Department of Medical Oncology, Hospital General Universitario Gregorio Marañón, Instituto de Investigación Sanitaria Gregorio Marañón, 28007 Madrid, Spain
María del Monte-Millán: Department of Medical Oncology, Hospital General Universitario Gregorio Marañón, Instituto de Investigación Sanitaria Gregorio Marañón, 28007 Madrid, Spain
Antonio C. Picornell: Department of Medical Oncology, Hospital General Universitario Gregorio Marañón, Instituto de Investigación Sanitaria Gregorio Marañón, 28007 Madrid, Spain
Miguel Martín: Department of Medical Oncology, Hospital General Universitario Gregorio Marañón, Instituto de Investigación Sanitaria Gregorio Marañón, 28007 Madrid, Spain
Juan Romo: UC3M-BS Santander Big Data Institute, 28903 Getafe, Spain

Mathematics, 2021, vol. 9, issue 3, 1-14

Abstract: Over the last decade, regularized regression methods have offered alternatives for performing multi-marker analysis and feature selection in a whole genome context. The process of defining a list of genes that will characterize an expression profile remains unclear. It currently relies upon advanced statistics and can use an agnostic point of view or include some a priori knowledge, but overfitting remains a problem. This paper introduces a methodology to deal with the variable selection and model estimation problems in the high-dimensional set-up, which can be particularly useful in the whole genome context. Results are validated using simulated data and a real dataset from a triple-negative breast cancer study.

Keywords: variable selection; high dimension; regularization; classification; sparse-group lasso (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/9/3/222/pdf (application/pdf)
https://www.mdpi.com/2227-7390/9/3/222/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:9:y:2021:i:3:p:222-:d:485789

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jmathe:v:9:y:2021:i:3:p:222-:d:485789