Performance-Driven Dimensionality Reduction: A Data-Centric Approach to Feature Engineering in Machine Learning
Joshua Chung,
Marcos Lopez de Prado,
Horst D. Simon and
Kesheng Wu
Chapter 9 in Transactions of ADIA Lab:Interdisciplinary Advances in Data and Computational Science, 2025, pp 245-272 from World Scientific Publishing Co. Pte. Ltd.
Abstract:
In a number of applications, data may be anonymized, obfuscated, and highly noisy. In such cases, it is difficult to use domain knowledge and low-dimensional visualizations to engineer the features for tasks such as machine learning. In this work, we explore a variety of dimensionality reduction (DR) techniques in the form of feature extraction and feature selection to decrease multicollinearity and improve the predictive power of our modeling tasks. These techniques include principal component analysis (PCA), locally linear embedding (LLE), Isomap, Kernel principal component analysis (KPCA), uniform manifold approximation and projection (UMAP), mean decrease accuracy, Shapley Values, and feature clustering. Due to the data-driven nature of our methodology, all forms of DR algorithm selection, hyperparameter tuning, and model tuning are done purely based on performance on our models, rather than a priori knowledge. This method will show which technique will increase the predictive power of our random forest model. Due to the generality of our method, this approach offers flexibility for regression or classification with any machine learning model and any unsupervised DR technique.
Keywords: Computational Science; Data Science; AI Applications; Climate Science; Medical Imaging; Sustainability; Interdisciplinary Research; Data Science; Mathematical and Quantitative Finance (search for similar items in EconPapers)
JEL-codes: C45 C63 G11 Q54 (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www.worldscientific.com/doi/pdf/10.1142/9789819813049_0009 (application/pdf)
https://www.worldscientific.com/doi/abs/10.1142/9789819813049_0009 (text/html)
Ebook Access is available upon purchase.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:wsi:wschap:9789819813049_0009
Ordering information: This item can be ordered from
Access Statistics for this chapter
More chapters in World Scientific Book Chapters from World Scientific Publishing Co. Pte. Ltd.
Bibliographic data for series maintained by Tai Tone Lim ().