Dimensionality Reduction
Frank Acito
Additional contact information
Frank Acito: Indiana University
Chapter Chapter 5 in Predictive Analytics with KNIME, 2023, pp 85-103 from Springer
Abstract:
Abstract In business analytics and predictive modeling, data sets often contain hundreds or even thousands of predictor variables, which can create challenges in terms of both efficiency and effectiveness. This chapter explores the problems associated with large numbers of variables and delves into various approaches for dimension reduction to address these issues. The “curse of dimensionality” refers to the exponential increase in the number of observations needed to maintain predictive model accuracy as the number of predictors increases. Moreover, including irrelevant or redundant variables can reduce the performance of predictive models. Surprisingly, even too many relevant variables can diminish overall accuracy. Having an excessive number of variables also introduces various undesirable effects. Computer processing time increases, and predictive models become more complex and challenging to maintain. Redundant variables can cause instability in the model, and variables unrelated to the target should be removed, such as customer ID numbers or those with regulatory concerns. To mitigate these challenges, three general approaches to dimension reduction are discussed: manually removing variables based on specific criteria, using algorithms to select the most predictive variables, and employing principal component analysis (PCA) to create linear combinations of original variables. The chapter emphasizes the importance of carefully considering which variables to retain and which to exclude to balance predictive power and model complexity. It concludes by acknowledging the trade-offs involved in dimension reduction and the need for thoughtful analysis when dealing with large numbers of predictor variables in applied situations.
Date: 2023
References: Add references at CitEc
Citations:
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:sprchp:978-3-031-45630-5_5
Ordering information: This item can be ordered from
http://www.springer.com/9783031456305
DOI: 10.1007/978-3-031-45630-5_5
Access Statistics for this chapter
More chapters in Springer Books from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().