Data Transformation (Pre-processing)
Steven Finlay
Chapter 6 in Credit Scoring, Response Modeling, and Insurance Rating, 2012, pp 144-164 from Palgrave Macmillan
Abstract:
Abstract Model construction techniques display varying degrees of sensitivity to the way data is presented to them. Data transformation is undertaken to provide an alternative representation of the data, that it is hoped will lead to a better (more predictive) model than would result from using the data in its original form. Data transformation typically achieves the following outcomes: Linearization. Transformations are applied so that the relationships between the predictor variables and the dependent variable are (approximately) linear. Having linear relationships is important for methods such as linear regression and logistic regression. If the relationships in the data are highly non-linear then poor models will result using these methods. Linearization is less important for non-linear techniques such as CART and neural networks. Standardization. If one predictor variable takes values in the range 10,000 to 1,000,000 and another takes values in the range 0.01 to 1, then the parameter coefficients (the model weights) will be very different, even if the two variables contribute equally to the model. This is not an issue for all model construction techniques, but as a rule, it is good practice to transform interval variables so that they all take values that lie on the same scale.
Keywords: Predictor Variable; Neural Network Model; Interval Variable; Model Construction; Data Transformation (search for similar items in EconPapers)
Date: 2012
References: Add references at CitEc
Citations:
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:pal:palchp:978-1-137-03169-3_6
Ordering information: This item can be ordered from
http://www.palgrave.com/9781137031693
DOI: 10.1057/9781137031693_6
Access Statistics for this chapter
More chapters in Palgrave Macmillan Books from Palgrave Macmillan
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().