Data Pre-processing
Max Kuhn and
Kjell Johnson
Additional contact information
Max Kuhn: Pfizer Global Research and Development, Division of Nonclinical Statistics
Kjell Johnson: Arbor Analytics
Chapter Chapter 3 in Applied Predictive Modeling, 2013, pp 27-59 from Springer
Abstract:
Abstract Data preprocessing techniques generally refer to the addition, deletion, or transformation of the training set data. Preprocessing data is a crucial step prior to modeling since data preparation can make or break a model’s predictive ability. To illustrate general preprocessing techniques, we begin by introducing a cell segmentation data set (Section 3.1). This data set contains common predictor problems such as skewness, outliers, and missing values. Sections 3.2 and 3.3 review predictor transformations for single predictors and multiple predictors, respectively. In Section 3.4 we discuss several approaches for handling missing data. Other preprocessing steps may include removing (Section 3.5), adding (Section 3.6), or binning (Section 3.7) predictors, all of which must be done carefully so that predictive information is not lost or erroneous information is added to the data. The computing section (3.8) provides R syntax for the previously described preprocessing steps. Exercises are provided at the end of the chapter to solidify concepts.
Keywords: Principal Component Analysis; Predictor Variable; Partial Little Square; Systemic Inflammatory Response Syndrome; Multivariate Adaptive Regression Spline (search for similar items in EconPapers)
Date: 2013
References: Add references at CitEc
Citations: View citations in EconPapers (2)
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:sprchp:978-1-4614-6849-3_3
Ordering information: This item can be ordered from
http://www.springer.com/9781461468493
DOI: 10.1007/978-1-4614-6849-3_3
Access Statistics for this chapter
More chapters in Springer Books from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().