On the choice and influence of the number of boosting steps for high-dimensional linear Cox-models

Seibold, Heidi; Bernau, Christoph; Boulesteix, Anne-Laure; De Bin, Riccardo

On the choice and influence of the number of boosting steps for high-dimensional linear Cox-models

Heidi Seibold (), Christoph Bernau (), Anne-Laure Boulesteix () and Riccardo De Bin ()
Additional contact information
Heidi Seibold: LMU Munich
Christoph Bernau: Leibniz Supercomputing Centre
Anne-Laure Boulesteix: LMU Munich
Riccardo De Bin: LMU Munich

Computational Statistics, 2018, vol. 33, issue 3, No 6, 1195-1215

Abstract: Abstract In biomedical research, boosting-based regression approaches have gained much attention in the last decade. Their intrinsic variable selection procedure and ability to shrink the estimates of the regression coefficients toward 0 make these techniques appropriate to fit prediction models in the case of high-dimensional data, e.g. gene expressions. Their prediction performance, however, highly depends on specific tuning parameters, in particular on the number of boosting iterations to perform. This crucial parameter is usually selected via cross-validation. The cross-validation procedure may highly depend on a completely random component, namely the considered fold partition. We empirically study how much this randomness affects the results of the boosting techniques, in terms of selected predictors and prediction ability of the related models. We use four publicly available data sets related to four different diseases. In these studies, the goal is to predict survival end-points when a large number of continuous candidate predictors are available. We focus on two well known boosting approaches implemented in the R-packages CoxBoost and mboost, assuming the validity of the proportional hazards assumption and the linearity of the effects of the predictors. We show that the variability in selected predictors and prediction ability of the model is reduced by averaging over several repetitions of cross-validation in the selection of the tuning parameters.

Keywords: Boosting; Cross-validation; Parameter tuning; High dimensional data; Survival analysis (search for similar items in EconPapers)
Date: 2018
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (3)

Downloads: (external link)
http://link.springer.com/10.1007/s00180-017-0773-8 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:compst:v:33:y:2018:i:3:d:10.1007_s00180-017-0773-8

Ordering information: This journal article can be ordered from
http://www.springer.com/statistics/journal/180/PS2

DOI: 10.1007/s00180-017-0773-8

Access Statistics for this article

Computational Statistics is currently edited by Wataru Sakamoto, Ricardo Cao and Jürgen Symanzik

More articles in Computational Statistics from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().