EconPapers    
Economics at your fingertips  
 

Machine Learning Models and Data-Balancing Techniques for Credit Scoring: What Is the Best Combination?

Ahmed Almustfa Hussin Adam Khatir and Marco Bee
Additional contact information
Ahmed Almustfa Hussin Adam Khatir: Department of Economics and Management, University of Trento, Via Inama 5, 38122 Trento, Italy

Risks, 2022, vol. 10, issue 9, 1-22

Abstract: Forecasting the creditworthiness of customers is a central issue of banking activity. This task requires the analysis of large datasets with many variables, for which machine learning algorithms and feature selection techniques are a crucial tool. Moreover, the percentages of “good” and “bad” customers are typically imbalanced such that over- and undersampling techniques should be employed. In the literature, most investigations tackle these three issues individually. Since there is little evidence about their joint performance, in this paper, we try to fill this gap. We use five machine learning classifiers, and each of them is combined with different feature selection techniques and various data-balancing approaches. According to the empirical analysis of a retail credit bank dataset, we find that the best combination is given by random forests, random forest recursive feature elimination and random oversampling.

Keywords: machine learning; imbalanced data; feature selection; credit scoring (search for similar items in EconPapers)
JEL-codes: C G0 G1 G2 G3 K2 M2 M4 (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (4)

Downloads: (external link)
https://www.mdpi.com/2227-9091/10/9/169/pdf (application/pdf)
https://www.mdpi.com/2227-9091/10/9/169/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jrisks:v:10:y:2022:i:9:p:169-:d:895806

Access Statistics for this article

Risks is currently edited by Mr. Claude Zhang

More articles in Risks from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-22
Handle: RePEc:gam:jrisks:v:10:y:2022:i:9:p:169-:d:895806