Ensemble-Based Machine Learning Algorithm for Loan Default Risk Prediction

Akinjole, Abisola; Shobayo, Olamilekan; Popoola, Jumoke; Okoyeigbo, Obinna; Ogunleye, Bayode

Ensemble-Based Machine Learning Algorithm for Loan Default Risk Prediction

Abisola Akinjole, Olamilekan Shobayo (), Jumoke Popoola, Obinna Okoyeigbo and Bayode Ogunleye
Additional contact information
Abisola Akinjole: School of Computing and Digital Technologies, Sheffield Hallam University, Sheffield S1 2NU, UK
Olamilekan Shobayo: School of Computing and Digital Technologies, Sheffield Hallam University, Sheffield S1 2NU, UK
Jumoke Popoola: School of Computing and Digital Technologies, Sheffield Hallam University, Sheffield S1 2NU, UK
Obinna Okoyeigbo: Department of Engineering, Edge Hill University, Ormskirk L39 4QP, UK
Bayode Ogunleye: Department of Computing & Mathematics, University of Brighton, Brighton BN2 4GJ, UK

Mathematics, 2024, vol. 12, issue 21, 1-32

Abstract: Predicting credit default risk is important to financial institutions, as accurately predicting the likelihood of a borrower defaulting on their loans will help to reduce financial losses, thereby maintaining profitability and stability. Although machine learning models have been used in assessing large applications with complex attributes for these predictions, there is still a need to identify the most effective techniques for the model development process, including the technique to address the issue of data imbalance. In this research, we conducted a comparative analysis of random forest, decision tree, SVMs (Support Vector Machines), XGBoost (Extreme Gradient Boosting), ADABoost (Adaptive Boosting) and the multi-layered perceptron, to predict credit defaults using loan data from LendingClub. Additionally, XGBoost was used as a framework for testing and evaluating various techniques. Moreover, we applied this XGBoost framework to handle the issue of class imbalance observed, by testing various resampling methods such as Random Over-Sampling (ROS), the Synthetic Minority Over-Sampling Technique (SMOTE), Adaptive Synthetic Sampling (ADASYN), Random Under-Sampling (RUS), and hybrid approaches like the SMOTE with Tomek Links and the SMOTE with Edited Nearest Neighbours (SMOTE + ENNs). The results showed that balanced datasets significantly outperformed the imbalanced dataset, with the SMOTE + ENNs delivering the best overall performance, achieving an accuracy of 90.49%, a precision of 94.61% and a recall of 92.02%. Furthermore, ensemble methods such as voting and stacking were employed to enhance performance further. Our proposed model achieved an accuracy of 93.7%, a precision of 95.6% and a recall of 95.5%, which shows the potential of ensemble methods in improving credit default predictions and can provide lending platforms with the tool to reduce default rates and financial losses. In conclusion, the findings from this study have broader implications for financial institutions, offering a robust approach to risk assessment beyond the LendingClub dataset.

Keywords: credit default prediction; deep learning; ensemble learning; machine learning (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://www.mdpi.com/2227-7390/12/21/3423/pdf (application/pdf)
https://www.mdpi.com/2227-7390/12/21/3423/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:12:y:2024:i:21:p:3423-:d:1511857

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().