Evaluating classification accuracy: the impact of resampling and dataset size

Imlawi, Jehad; Alsharo, Mohammad

Evaluating classification accuracy: the impact of resampling and dataset size

Jehad Imlawi and Mohammad Alsharo

International Journal of Business Information Systems, 2017, vol. 24, issue 1, 91-101

Abstract: Correct prediction is important criterion in evaluating classifiers in supervised learning context. The accuracy rate is a widely accepted indicator of the probability of misclassification of a classifier. Nevertheless, true accuracy remains unknown in most cases since it is not always possible to include the whole population in a study, and it is difficult to calculate the probability distribution of the data. Therefore, researchers often rely on computing estimation from the available data through sampling. When the available data is small or limited, it is common to rely on a resampling technique for accuracy estimation. In this paper, we study the impact of the resampling against non-resampling estimation method, with different dataset sizes on the sample distribution variance. Initial results indicate that there is a significant difference in the variance of the sample distribution between resampling and non-resampling. We also found that the larger the dataset size, the less significant the difference in variance.

Keywords: resampling; dataset size; classification accuracy; cross validation; distribution variance; classifier evaluation; supervised learning. (search for similar items in EconPapers)
Date: 2017
References: Add references at CitEc
Citations:

Downloads: (external link)
http://www.inderscience.com/link.php?id=80947 (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:ids:ijbisy:v:24:y:2017:i:1:p:91-101

Access Statistics for this article

More articles in International Journal of Business Information Systems from Inderscience Enterprises Ltd
Bibliographic data for series maintained by Sarah Parker ().