Classification with machine learning algorithms after hybrid feature selection in imbalanced data sets
Meryem Pulat () and
İpek Deveci Kocakoç ()
Operations Research and Decisions, 2024, vol. 34, issue 4, 157-183
Abstract:
The efficacy of machine learning algorithms significantly depends on the adequacy and relevance of features in the data set. Hence, feature selection precedes the classification process. In this study, a hybrid feature selection approach, integrating filter and wrapper methods was employed. This approach not only enhances classification accuracy, surpassing the results achievable with filter methods alone, but also reduces processing time compared to exclusive reliance on wrapper methods. Results indicate a general improvement in algorithm performance with the application of the hybrid feature selection approach. The study utilized the Taiwanese Bankruptcy and Statlog (German Credit Data) datasets from the UCI Machine Learning Repository. These datasets exhibit an unbalanced distribution, necessitating data preprocessing that considers this unbalance. After acknowledging the datasets’ unbalanced nature, feature selection and subsequent classification processes were executed.
Keywords: machine learning; ensemble learning; classification; feature selection; unbalanced dataset (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://ord.pwr.edu.pl/assets/papers_archive/ord2024vol34no4_10.pdf (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:wut:journl:v:34:y:2024:i:4:p:157-183:id:10
DOI: 10.37190/ord240410
Access Statistics for this article
More articles in Operations Research and Decisions from Wroclaw University of Science and Technology, Faculty of Management Contact information at EDIRC.
Bibliographic data for series maintained by Adam Kasperski ().