An Empirical Comparison of Machine-Learning Methods on Bank Client Credit Assessments

Munkhdalai, Lkhagvadorj; Munkhdalai, Tsendsuren; Namsrai, Oyun-Erdene; Lee, Jong Yun; Ryu, Keun Ho

An Empirical Comparison of Machine-Learning Methods on Bank Client Credit Assessments

Lkhagvadorj Munkhdalai, Tsendsuren Munkhdalai, Oyun-Erdene Namsrai, Jong Yun Lee and Keun Ho Ryu
Additional contact information
Lkhagvadorj Munkhdalai: Database/Bioinformatics Laboratory, College of Electrical and Computer Engineering, Chungbuk National University, Cheongju 28644, Korea
Tsendsuren Munkhdalai: Microsoft Research, Montreal, QC H3A 3H3, Canada
Oyun-Erdene Namsrai: Department of Information and Computer Sciences, National University of Mongolia, Sukhbaatar District, Building#3 Room#212, Ulaanbaatar 14201, Mongolia
Jong Yun Lee: Database/Bioinformatics Laboratory, College of Electrical and Computer Engineering, Chungbuk National University, Cheongju 28644, Korea
Keun Ho Ryu: Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City 700000, Vietnam

Sustainability, 2019, vol. 11, issue 3, 1-23

Abstract: Machine learning and artificial intelligence have achieved a human-level performance in many application domains, including image classification, speech recognition and machine translation. However, in the financial domain expert-based credit risk models have still been dominating. Establishing meaningful benchmark and comparisons on machine-learning approaches and human expert-based models is a prerequisite in further introducing novel methods. Therefore, our main goal in this study is to establish a new benchmark using real consumer data and to provide machine-learning approaches that can serve as a baseline on this benchmark. We performed an extensive comparison between the machine-learning approaches and a human expert-based model—FICO credit scoring system—by using a Survey of Consumer Finances (SCF) data. As the SCF data is non-synthetic and consists of a large number of real variables, we applied two variable-selection methods: the first method used hypothesis tests, correlation and random forest-based feature importance measures and the second method was only a random forest-based new approach (NAP), to select the best representative features for effective modelling and to compare them. We then built regression models based on various machine-learning algorithms ranging from logistic regression and support vector machines to an ensemble of gradient boosted trees and deep neural networks. Our results demonstrated that if lending institutions in the 2001s had used their own credit scoring model constructed by machine-learning methods explored in this study, their expected credit losses would have been lower, and they would be more sustainable. In addition, the deep neural networks and XGBoost algorithms trained on the subset selected by NAP achieve the highest area under the curve (AUC) and accuracy, respectively.

Keywords: automated credit scoring; decision making; machine learning; internet bank; sustainability (search for similar items in EconPapers)
JEL-codes: O13 Q Q0 Q2 Q3 Q5 Q56 (search for similar items in EconPapers)
Date: 2019
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (18)

Downloads: (external link)
https://www.mdpi.com/2071-1050/11/3/699/pdf (application/pdf)
https://www.mdpi.com/2071-1050/11/3/699/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jsusta:v:11:y:2019:i:3:p:699-:d:201610

Access Statistics for this article

Sustainability is currently edited by Ms. Alexandra Wu

More articles in Sustainability from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().