Improved Outcome Prediction Across Data Sources Through Robust Parameter Tuning

Ellenbach, Nicole; Boulesteix, Anne-Laure; Bischl, Bernd; Unger, Kristian; Hornung, Roman

Improved Outcome Prediction Across Data Sources Through Robust Parameter Tuning

Nicole Ellenbach (), Anne-Laure Boulesteix, Bernd Bischl, Kristian Unger and Roman Hornung
Additional contact information
Nicole Ellenbach: University of Munich
Anne-Laure Boulesteix: University of Munich
Bernd Bischl: University of Munich
Kristian Unger: Helmholtz Zentrum Munich, German Research Center for Environmental Health GmbH
Roman Hornung: University of Munich

Journal of Classification, 2021, vol. 38, issue 2, No 3, 212-231

Abstract: Abstract In many application areas, prediction rules trained based on high-dimensional data are subsequently applied to make predictions for observations from other sources, but they do not always perform well in this setting. This is because data sets from different sources can feature (slightly) differing distributions, even if they come from similar populations. In the context of high-dimensional data and beyond, most prediction methods involve one or several tuning parameters. Their values are commonly chosen by maximizing the cross-validated prediction performance on the training data. This procedure, however, implicitly presumes that the data to which the prediction rule will be ultimately applied, follow the same distribution as the training data. If this is not the case, less complex prediction rules that slightly underfit the training data may be preferable. Indeed, a tuning parameter does not only control the degree of adjustment of a prediction rule to the training data, but also, more generally, the degree of adjustment to the distribution of the training data. On the basis of this idea, in this paper we compare various approaches including new procedures for choosing tuning parameter values that lead to better generalizing prediction rules than those obtained based on cross-validation. Most of these approaches use an external validation data set. In our extensive comparison study based on a large collection of 15 transcriptomic data sets, tuning on external data and robust tuning with a tuned robustness parameter are the two approaches leading to better generalizing prediction rules.

Keywords: Prediction; Robust modeling; Tuning parameter value optimization; Batch effects (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s00357-020-09368-z Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:jclass:v:38:y:2021:i:2:d:10.1007_s00357-020-09368-z

Ordering information: This journal article can be ordered from
http://www.springer. ... hods/journal/357/PS2

DOI: 10.1007/s00357-020-09368-z

Access Statistics for this article

Journal of Classification is currently edited by Douglas Steinley

More articles in Journal of Classification from Springer, The Classification Society
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().