Feature Selection in Imbalanced Data
Firuz Kamalov (),
Fadi Thabtah () and
Ho Hon Leung ()
Additional contact information
Firuz Kamalov: Canadian University of Dubai
Fadi Thabtah: Manukau Institute of Technology
Ho Hon Leung: UAE University
Annals of Data Science, 2023, vol. 10, issue 6, No 5, 1527-1541
Abstract:
Abstract The traditional feature selection methods are not suitable for imbalanced data as they tend to be biased towards the majority class. This problem is particularly acute in the field of medical diagnostics and fraud detection where the class distribution is highly skewed. In this paper, we propose a novel filter approach using decision tree-based $$F_1$$ F 1 -score. The $$F_1$$ F 1 -score incorporates the accuracy with respect to the minority class data and hence is a good measure in the case of imbalanced data. In the proposed implementation, the $$F_1$$ F 1 -score is calculated based on a 1-dimensional decision tree classifier resulting in a fast and effective feature evaluation method. Numerical experiments confirm that the proposed method achieves robust dimensionality reduction and accuracy results. In addition, the low computational complexity of the algorithm makes it a practical choice for big data applications.
Keywords: Imbalanced data; Feature selection; Filter method; $$F_1$$ F 1 -score; Big data; Data mining; Machine learning (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s40745-021-00366-5 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:aodasc:v:10:y:2023:i:6:d:10.1007_s40745-021-00366-5
Ordering information: This journal article can be ordered from
https://www.springer ... gement/journal/40745
DOI: 10.1007/s40745-021-00366-5
Access Statistics for this article
Annals of Data Science is currently edited by Yong Shi
More articles in Annals of Data Science from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().