EconPapers    
Economics at your fingertips  
 

Interpretable Machine Learning Framework for Corporate Financialization Prediction: A SHAP-Based Analysis of High-Dimensional Data

Yanhe Wang, Wei Wei, Zhuodong Liu, Jiahe Liu, Yinzhen Lv and Xiangyu Li ()
Additional contact information
Yanhe Wang: School of Economics and Management, Beijing Jiaotong University, Beijing 100044, China
Wei Wei: School of Economics and Management, Beijing Jiaotong University, Beijing 100044, China
Zhuodong Liu: School of Economics and Management, Beijing Jiaotong University, Beijing 100044, China
Jiahe Liu: School of Economics and Management, Beijing Jiaotong University, Beijing 100044, China
Yinzhen Lv: School of Economics and Management, Beijing Jiaotong University, Beijing 100044, China
Xiangyu Li: Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai 200240, China

Mathematics, 2025, vol. 13, issue 15, 1-27

Abstract: High-dimensional prediction problems with complex non-linear feature interactions present significant algorithmic challenges in machine learning, particularly when dealing with imbalanced datasets and multicollinearity issues. This study proposes an innovative Shapley Additive Explanations (SHAP)-enhanced machine learning framework that integrates SHAP with advanced ensemble methods for interpretable financialization prediction. The methodology simultaneously addresses high-dimensional feature selection using 40 independent variables (19 CSR-related and 21 financialization-related), multicollinearity issues, and model interpretability requirements. Using a comprehensive dataset of 25,642 observations from 3776 Chinese A-share companies (2011–2022), we implement nine optimized machine learning algorithms with hyperparameter tuning via the Hippopotamus Optimization algorithm and five-fold cross-validation. XGBoost demonstrates superior performance with 99.34% explained variance, achieving an RMSE of 0.082 and R 2 of 0.299. SHAP analysis reveals non-linear U-shaped relationships between key predictors and financialization outcomes, with critical thresholds at approximately 10 for CSR_SocR, 1.5 for CSR_S, and 5 for CSR_CV. SOE status, EPU, ownership concentration, firm size, and housing prices emerge as the most influential predictors. Notable shifts in factor importance occur during the COVID-19 pandemic period (2020–2022). This work contributes a scalable, interpretable machine learning architecture for high-dimensional financial prediction problems, with applications in risk assessment, portfolio optimization, and regulatory monitoring systems.

Keywords: machine learning; SHAP interpretability; financial prediction modeling; high-dimensional data analysis; corporate social responsibility; U-shaped relationships (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/13/15/2526/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/15/2526/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:15:p:2526-:d:1718948

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-08-07
Handle: RePEc:gam:jmathe:v:13:y:2025:i:15:p:2526-:d:1718948