EconPapers    
Economics at your fingertips  
 

Analyzing credit risk model problems through natural language processing-based clustering and machine learning: insights from validation reports

Szymon Lis, Mariusz Kubkowski, Olimpia Borkowska, Dobromił Serwa and Jarosław Kuparnik

Journal of Risk Model Validation

Abstract: This paper employs clustering and machine learning techniques to analyze validation reports. It provides insights into issues related to credit risk model development, implementation and maintenance. Natural language processing is used in the study to classify issues based on findings raised in validation reports. A total of 657 findings, which are raised for selected credit risk models in a large banking institution between 2019 and 2022, are grouped into nine categories representing different validation dimensions. Next, sentence embedding generation from titles and descriptions of findings is used to create predictors in classification models of the validation dimensions. Several clustering methods are compared in order to group similar findings, effectively identifying common issues in each category with an accuracy level of more than 60%. Further, machine learning algorithms, such as logistic regression and extreme gradient boosting (XGBoost), are employed to forecast the finding’s category, with XGBoost achieving 80% accuracy. The top 10 predictive words for each category are also determined.

References: Add references at CitEc
Citations:

Downloads: (external link)
https://www.risk.net/journal-of-risk-model-validat ... m-validation-reports (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:rsk:journ5:7960013

Access Statistics for this article

More articles in Journal of Risk Model Validation from Journal of Risk Model Validation
Bibliographic data for series maintained by Thomas Paine ().

 
Page updated 2025-03-19
Handle: RePEc:rsk:journ5:7960013