Over-Fitting and Model Tuning
Max Kuhn and
Kjell Johnson
Additional contact information
Max Kuhn: Pfizer Global Research and Development, Division of Nonclinical Statistics
Kjell Johnson: Arbor Analytics
Chapter Chapter 4 in Applied Predictive Modeling, 2013, pp 61-92 from Springer
Abstract:
Abstract Many modern classification and regression models are highly adaptable; they are capable of modeling complex relationships. Each model's adaptability is typically governed by a set of tuning parameters, which can allow each model to pinpoint predictive patterns and structures within the data. However, these tuning parameters can very identify predictive patterns that are not reproducible. This is known as “over-fitting.” Models that are over-fit generally have excellent predictivity for the samples on which they were built, but poor predictivity for new samples. Without a methodological approach to building and evaluating models, the modeler will not know if the model is over-fit until the next set of samples are predicted. In Section 4.1 we use a simple example to illustrate the problem of over-fitting. We then describe a systematic process for tuning models (Section 4.2), which is foundational to the remaining parts of the book. Core to model tuning are appropriate ways for splitting (or spending) the data, which is covered in Section 4.3. Resampling techniques (Section 4.4) are an alternative or complementary approach to data splitting. Recommendations for approaches to data splitting are provided in Section 4.7. After evaluating a number tuning parameters via data splitting or resampling, we must choose the final tuning parameters (Section 4.6). We also discuss how to choose the optimal model across several tuned models (Section 4.8) We illustrate how to implement the recommended techniques discussed in this chapter in the Computing Section (4.9). Exercises are provided at the end of the chapter to solidify concepts.
Keywords: Tuning Parameter; Multivariate Adaptive Regression Spline; Data Splitting; Multivariate Adaptive Regression Spline Model; Random Number Seed (search for similar items in EconPapers)
Date: 2013
References: Add references at CitEc
Citations:
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:sprchp:978-1-4614-6849-3_4
Ordering information: This item can be ordered from
http://www.springer.com/9781461468493
DOI: 10.1007/978-1-4614-6849-3_4
Access Statistics for this chapter
More chapters in Springer Books from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().