Instance sampling in credit scoring: An empirical study of sample size and balancing

Crone, Sven F.; Finlay, Steven

Instance sampling in credit scoring: An empirical study of sample size and balancing

Sven F. Crone and Steven Finlay

International Journal of Forecasting, 2012, vol. 28, issue 1, 224-238

Abstract: To date, best practice in sampling credit applicants has been established based largely on expert opinion, which generally recommends that small samples of 1500 instances each of both goods and bads are sufficient, and that the heavily biased datasets observed should be balanced by undersampling the majority class. Consequently, the topics of sample sizes and sample balance have not been subject to either formal study in credit scoring, or empirical evaluations across different data conditions and algorithms of varying efficiency. This paper describes an empirical study of instance sampling in predicting consumer repayment behaviour, evaluating the relative accuracies of logistic regression, discriminant analysis, decision trees and neural networks on two datasets across 20 samples of increasing size and 29 rebalanced sample distributions created by gradually under- and over-sampling the goods and bads respectively. The paper makes a practical contribution to model building on credit scoring datasets, and provides evidence that using samples larger than those recommended in credit scoring practice provides a significant increase in accuracy across algorithms.

Keywords: Credit scoring; Data pre-processing; Sample size; Under-sampling; Over-sampling; Balancing (search for similar items in EconPapers)
Date: 2012
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (35)

Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0169207011001403
Full text for ScienceDirect subscribers only

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:eee:intfor:v:28:y:2012:i:1:p:224-238

DOI: 10.1016/j.ijforecast.2011.07.006

Access Statistics for this article

International Journal of Forecasting is currently edited by R. J. Hyndman

More articles in International Journal of Forecasting from Elsevier
Bibliographic data for series maintained by Catherine Liu ().