Filtering-Based Instance Selection Method for Overlapping Problem in Imbalanced Datasets
Marcio Rubbo and
Leandro A. Silva
Additional contact information
Marcio Rubbo: Graduate Program in Electrical Engineering and Computing, Mackenzie Presbyterian University, Rua da Consolação, 896, Prédio 30, Consolação, São Paulo 01302-907, Brazil
Leandro A. Silva: Graduate Program in Electrical Engineering and Computing, Mackenzie Presbyterian University, Rua da Consolação, 896, Prédio 30, Consolação, São Paulo 01302-907, Brazil
J, 2021, vol. 4, issue 3, 1-20
Abstract:
The overlapping problem occurs when a region of the dimensional data space is shared in a similar proportion by different classes. It has an impact on a classifier’s performance due to the difficulty in correctly separating the classes. Further, an imbalanced dataset consists of a situation in which one class has more instances than another, and this is another aspect that impacts a classifier’s performance. In general, these two problems are treated separately. On the other hand, Prototype Selection (PS) approaches are employed as strategies for selecting appropriate instances from a dataset by filtering redundant and noise data, which can cause misclassification performance. In this paper, we introduce Filtering-based Instance Selection (FIS), using as a base the Self-Organizing Maps Neural Network (SOM) and information entropy. In this sense, SOM is trained with a dataset, and, then, the instances of the training set are mapped to the nearest prototype (SOM neurons). An analysis with entropy is conducted in each prototype region. From a threshold, we propose three decision methods: filtering the majority class (H-FIS (High Filter IS)), the minority class (L-FIS (Low Filter IS)), and both classes (B-FIS). The experiments using artificial and real dataset showed that the methods proposed in combination with 1NN improved the accuracy, F-Score, and G-mean values when compared with the 1NN classifier without the filter methods. The FIS approach is also compatible with the approaches mentioned in the relevant literature.
Keywords: prototype selection; self-organizing maps; imbalanced datasets; overlapping problem (search for similar items in EconPapers)
JEL-codes: I1 I10 I12 I13 I14 I18 I19 (search for similar items in EconPapers)
Date: 2021
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2571-8800/4/3/24/pdf (application/pdf)
https://www.mdpi.com/2571-8800/4/3/24/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jjopen:v:4:y:2021:i:3:p:24-327:d:591448
Access Statistics for this article
J is currently edited by Ms. Angelia Su
More articles in J from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().