Analyzing credit risk model problems through natural language processing-based clustering and machine learning: insights from validation reports
Szymon Lis,
Mariusz Kubkowski,
Olimpia Borkowska,
Dobromił Serwa and
Jarosław Kuparnik
Journal of Risk Model Validation
Abstract:
This paper employs clustering and machine learning techniques to analyze validation reports. It provides insights into issues related to credit risk model development, implementation and maintenance. Natural language processing is used in the study to classify issues based on findings raised in validation reports. A total of 657 findings, which are raised for selected credit risk models in a large banking institution between 2019 and 2022, are grouped into nine categories representing different validation dimensions. Next, sentence embedding generation from titles and descriptions of findings is used to create predictors in classification models of the validation dimensions. Several clustering methods are compared in order to group similar findings, effectively identifying common issues in each category with an accuracy level of more than 60%. Further, machine learning algorithms, such as logistic regression and extreme gradient boosting (XGBoost), are employed to forecast the finding’s category, with XGBoost achieving 80% accuracy. The top 10 predictive words for each category are also determined.
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www.risk.net/journal-of-risk-model-validat ... m-validation-reports (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:rsk:journ5:7960013
Access Statistics for this article
More articles in Journal of Risk Model Validation from Journal of Risk Model Validation
Bibliographic data for series maintained by Thomas Paine ().