Understanding the performance of machine learning models to predict credit default: a novel approach for supervisory evaluation
Andres Alonso () and
José Manuel Carbó ()
Additional contact information
José Manuel Carbó: Banco de España
No 2105, Working Papers from Banco de España
In this paper we study the performance of several machine learning (ML) models for credit default prediction. We do so by using a unique and anonymized database from a major Spanish bank. We compare the statistical performance of a simple and traditionally used model like the Logistic Regression (Logit), with more advanced ones like Lasso penalized logistic regression, Classification And Regression Tree (CART), Random Forest, XGBoost and Deep Neural Networks. Following the process deployed for the supervisory validation of Internal Rating-Based (IRB) systems, we examine the benefits of using ML in terms of predictive power, both in classification and calibration. Running a simulation exercise for different sample sizes and number of features we are able to isolate the information advantage associated to the access to big amounts of data, and measure the ML model advantage. Despite the fact that ML models outperforms Logit both in classification and in calibration, more complex ML algorithms do not necessarily predict better. We then translate this statistical performance into economic impact. We do so by estimating the savings in regulatory capital when using ML models instead of a simpler model like Lasso to compute the risk-weighted assets. Our benchmark results show that implementing XGBoost could yield savings from 12.4% to 17% in terms of regulatory capital requirements under the IRB approach. This leads us to conclude that the potential benefits in economic terms for the institutions would be significant and this justify further research to better understand all the risks embedded in ML models.
Keywords: machine learning; credit risk; prediction; probability of default; IRB system (search for similar items in EconPapers)
JEL-codes: C38 C45 G21 (search for similar items in EconPapers)
Pages: 44 pages
New Economics Papers: this item is included in nep-big, nep-cmp and nep-rmg
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1) Track citations by RSS feed
Downloads: (external link)
https://www.bde.es/f/webbde/SES/Secciones/Publicac ... 21/Files/dt2105e.pdf First version, January 2021 (application/pdf)
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
Persistent link: https://EconPapers.repec.org/RePEc:bde:wpaper:2105
Access Statistics for this paper
More papers in Working Papers from Banco de España Contact information at EDIRC.
Bibliographic data for series maintained by María Beiro. Electronic Dissemination of Information Unit. Research Department. Banco de España ().