Partial dependence analysis of financial ratios in predicting company defaults: random forest vs XGBoost models
Monia Antar () and
Tahar Tayachi ()
Additional contact information
Monia Antar: Amercian University in the Emirates
Tahar Tayachi: Amercian University in the Emirates
Digital Finance, 2025, vol. 7, issue 4, No 16, 997-1012
Abstract:
Abstract In this paper, we investigate using machine learning models to predict credit defaults using financial ratios. We compare the performance and interpretability of two ensemble learning algorithms: Random Forest and XGBoost. To improve the models' capability to detect defaults, we exploit the inherent class imbalance of default prediction tasks with the ROSE (Random Over-Sampling Examples) technique to balance the dataset. Both models are trained on imbalanced and balanced datasets. We used Accuracy, Sensitivity, Specificity, F1 Score, and AUC (Area Under the ROC curve) to evaluate the models' performances. we validate model performance using Rank Graduation Accuracy (RGA) to assess ranking consistency, revealing superior predictive power on imbalanced data (RGA = 0.991–0.993) versus balanced distributions (RGA = 0.959–0.965). Contrary to oversampling orthodoxy, ROSE balancing degraded performance aligning with theoretical critiques of synthetic data in mature classifiers. We also interpret the models by calculating feature importance using Shapley-Lorenz values. Partial Dependence Plots (PDPs) help to visualize how key financial ratios impact the predicted probability of default. Results show non-linear relationships between key financial ratios, such as Return on Assets (R6), Debt to Equity Ratio (R8), and default risk. The key features shown are similar for Random Forest and XGBoost, though the interpretation of the feature importance differs slightly. To enhance the robustness and credibility of our feature effect analysis, we conducted ALE (Accumulated Local Effects) plots as they provide a more robust framwork that accounts for feature interractions. This study advances credit default prediction in Tunisia's banking sector by enhancing interpretability through Accumulated Local Effects analysis alongside Partial Dependence Plots, providing robust insights into feature effects, particularly for key financial. Results offer insights about important ratio thresholds and their impact on default probability prediction, such as the sharp drop in default risk when R6 becomes positive. These advancements provide regulators and financial institutions with more reliable tools for credit risk assessment in Tunisia's economic context, bridging the gap between sophisticated machine learning techniques and practical, interpretable financial decision-making.
Keywords: Credit risk default; Random Forest; XGBoost; ROSE; Agnostic Methods (search for similar items in EconPapers)
JEL-codes: C15 C45 G32 G33 M41 (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s42521-025-00135-6 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:digfin:v:7:y:2025:i:4:d:10.1007_s42521-025-00135-6
Ordering information: This journal article can be ordered from
https://www.springer.com/finance/journal/42521
DOI: 10.1007/s42521-025-00135-6
Access Statistics for this article
Digital Finance is currently edited by Wolfgang Karl Härdle, Steven Kou and Min Dai
More articles in Digital Finance from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().