A study on credit scoring modeling with different feature selection and machine learning approaches
Shrawan Kumar Trivedi
Technology in Society, 2020, vol. 63, issue C
Abstract:
A bit hurdle for financial institutions is to decide potential candidates to give a line of credit identifying the right people without any credit risk. For such a crucial decision, past demographic and financial data of debtors is important to build an automated artificial intelligence credit score prediction model based on machine learning classifier. In addition, for building robust and accurate machine learning models, important input predictors (debtor's information) must be selected. The present computational work focuses on building a credit scoring prediction model. A publicly available German credit data is incorporated in this study. An improvement in the credit scoring prediction has been shown with the use of different feature selection techniques (such as Information-gain, Gain-Ratio and Chi-Square) and machine learning classifiers (Bayesian, Naïve Bayes, Random Forest, Decision Tree (C5.0) and SVM (support Vector Machine)). Further, a comparative analysis is performed between different machine learning classifiers and between different feature selection techniques. Different evaluation metrics are considered for analyzing performance of the models (such as accuracy, F-measure, false positive rate, false negative rate and training time). After analysis, a best combination of machine learning classifier and feature selection technique are identified. In this study, a combination of random forest (RF) and Chi-Square (CS) is found good, among other combinations, with respect to good performance accuracy, F-measure and low false positive and false negative rates. However, training time for this particular combination was found to be slightly higher. Result of C5.0 with chi-square was comparable with the best one. This study provides an opportunity to financial institutions to build an automated model for better credit scoring.
Keywords: Credit scoring; Machine learning classifier; Random forest; C5.0; SVM; Naïve bayes; Bayesian; Feature selection technique; Information-gain; Gain-ratio; Chi-square (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (14)
Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0160791X17302324
Full text for ScienceDirect subscribers only
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:eee:teinso:v:63:y:2020:i:c:s0160791x17302324
DOI: 10.1016/j.techsoc.2020.101413
Access Statistics for this article
Technology in Society is currently edited by Charla Griffy-Brown
More articles in Technology in Society from Elsevier
Bibliographic data for series maintained by Catherine Liu ().