Interpretable Machine Learning Framework for Corporate Financialization Prediction: A SHAP-Based Analysis of High-Dimensional Data
Yanhe Wang,
Wei Wei,
Zhuodong Liu,
Jiahe Liu,
Yinzhen Lv and
Xiangyu Li ()
Additional contact information
Yanhe Wang: School of Economics and Management, Beijing Jiaotong University, Beijing 100044, China
Wei Wei: School of Economics and Management, Beijing Jiaotong University, Beijing 100044, China
Zhuodong Liu: School of Economics and Management, Beijing Jiaotong University, Beijing 100044, China
Jiahe Liu: School of Economics and Management, Beijing Jiaotong University, Beijing 100044, China
Yinzhen Lv: School of Economics and Management, Beijing Jiaotong University, Beijing 100044, China
Xiangyu Li: Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
Mathematics, 2025, vol. 13, issue 15, 1-27
Abstract:
High-dimensional prediction problems with complex non-linear feature interactions present significant algorithmic challenges in machine learning, particularly when dealing with imbalanced datasets and multicollinearity issues. This study proposes an innovative Shapley Additive Explanations (SHAP)-enhanced machine learning framework that integrates SHAP with advanced ensemble methods for interpretable financialization prediction. The methodology simultaneously addresses high-dimensional feature selection using 40 independent variables (19 CSR-related and 21 financialization-related), multicollinearity issues, and model interpretability requirements. Using a comprehensive dataset of 25,642 observations from 3776 Chinese A-share companies (2011–2022), we implement nine optimized machine learning algorithms with hyperparameter tuning via the Hippopotamus Optimization algorithm and five-fold cross-validation. XGBoost demonstrates superior performance with 99.34% explained variance, achieving an RMSE of 0.082 and R 2 of 0.299. SHAP analysis reveals non-linear U-shaped relationships between key predictors and financialization outcomes, with critical thresholds at approximately 10 for CSR_SocR, 1.5 for CSR_S, and 5 for CSR_CV. SOE status, EPU, ownership concentration, firm size, and housing prices emerge as the most influential predictors. Notable shifts in factor importance occur during the COVID-19 pandemic period (2020–2022). This work contributes a scalable, interpretable machine learning architecture for high-dimensional financial prediction problems, with applications in risk assessment, portfolio optimization, and regulatory monitoring systems.
Keywords: machine learning; SHAP interpretability; financial prediction modeling; high-dimensional data analysis; corporate social responsibility; U-shaped relationships (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/13/15/2526/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/15/2526/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:15:p:2526-:d:1718948
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().