Machine Learning for Credit Risk Prediction: A Systematic Literature Review

Noriega, Jomark Pablo; Rivera, Luis Antonio; Herrera, José Alfredo

Machine Learning for Credit Risk Prediction: A Systematic Literature Review

Jomark Pablo Noriega (), Luis Antonio Rivera and José Alfredo Herrera
Additional contact information
Jomark Pablo Noriega: Departamento Académico de Ciencia de la Computacion, Universidad Nacional Mayor de San Marcos, Decana de América, Lima 15081, Peru
Luis Antonio Rivera: Departamento Académico de Ciencia de la Computacion, Universidad Nacional Mayor de San Marcos, Decana de América, Lima 15081, Peru
José Alfredo Herrera: Departamento Académico de Ciencia de la Computacion, Universidad Nacional Mayor de San Marcos, Decana de América, Lima 15081, Peru

Data, 2023, vol. 8, issue 11, 1-17

Abstract: In this systematic review of the literature on using Machine Learning (ML) for credit risk prediction, we raise the need for financial institutions to use Artificial Intelligence (AI) and ML to assess credit risk, analyzing large volumes of information. We posed research questions about algorithms, metrics, results, datasets, variables, and related limitations in predicting credit risk. In addition, we searched renowned databases responding to them and identified 52 relevant studies within the credit industry of microfinance. Challenges and approaches in credit risk prediction using ML models were identified; we had difficulties with the implemented models such as the black box model, the need for explanatory artificial intelligence, the importance of selecting relevant features, addressing multicollinearity, and the problem of the imbalance in the input data. By answering the inquiries, we identified that the Boosted Category is the most researched family of ML models; the most commonly used metrics for evaluation are Area Under Curve (AUC), Accuracy (ACC), Recall, precision measure F1 (F1), and Precision. Research mainly uses public datasets to compare models, and private ones to generate new knowledge when applied to the real world. The most significant limitation identified is the representativeness of reality, and the variables primarily used in the microcredit industry are data related to the Demographic, Operation, and Payment behavior. This study aims to guide developers of credit risk management tools and software towards the existing ability of ML methods, metrics, and techniques used to forecast it, thereby minimizing possible losses due to default and guiding risk appetite.

Keywords: loan; credit risk; prediction; machine learning; systematic literature review (search for similar items in EconPapers)
JEL-codes: C8 C80 C81 C82 C83 (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (3)

Downloads: (external link)
https://www.mdpi.com/2306-5729/8/11/169/pdf (application/pdf)
https://www.mdpi.com/2306-5729/8/11/169/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jdataj:v:8:y:2023:i:11:p:169-:d:1275568

Access Statistics for this article

Data is currently edited by Ms. Becky Zhang

More articles in Data from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().