Evaluation of Resampling Methods in the Class Unbalance Problem
Kubus Mariusz ()
Additional contact information
Kubus Mariusz: Opole University of Technology, Opole, Poland
Econometrics. Advances in Applied Data Analysis, 2020, vol. 24, issue 1, 39-50
Abstract:
The purpose of many real world applications is the prediction of rare events, and the training sets are then highly unbalanced. In this case, the classifiers are biased towards the correct prediction of the majority class and they misclassify a minority class, whereas rare events are of the greater interest. To handle this problem, numerous techniques were proposed that balance the data or modify the learning algorithms. The goal of this paper is a comparison of simple random balancing methods with more sophisticated resampling methods that appeared in the literature and are available in R program. Additionally, the authors ask whether learning on the original dataset and using a shifted threshold for classification is not more competitive. The authors provide a survey from the perspective of regularized logistic regression and random forests. The results show that combining random under-sampling with random forests has an advantage over other techniques while logistic regression can be competitive in the case of highly unbalanced data.
Keywords: class unbalance; resampling; regularized logistic regression; random forests (search for similar items in EconPapers)
JEL-codes: C1 C38 C52 (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://doi.org/10.15611/eada.2020.1.04 (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:vrs:eaiada:v:24:y:2020:i:1:p:39-50:n:4
DOI: 10.15611/eada.2020.1.04
Access Statistics for this article
Econometrics. Advances in Applied Data Analysis is currently edited by Józef Dziechciarz
More articles in Econometrics. Advances in Applied Data Analysis from Sciendo
Bibliographic data for series maintained by Peter Golla ().