EconPapers    
Economics at your fingertips  
 

Feature Selection in Imbalanced Data

Firuz Kamalov (), Fadi Thabtah () and Ho Hon Leung ()
Additional contact information
Firuz Kamalov: Canadian University of Dubai
Fadi Thabtah: Manukau Institute of Technology
Ho Hon Leung: UAE University

Annals of Data Science, 2023, vol. 10, issue 6, No 5, 1527-1541

Abstract: Abstract The traditional feature selection methods are not suitable for imbalanced data as they tend to be biased towards the majority class. This problem is particularly acute in the field of medical diagnostics and fraud detection where the class distribution is highly skewed. In this paper, we propose a novel filter approach using decision tree-based $$F_1$$ F 1 -score. The $$F_1$$ F 1 -score incorporates the accuracy with respect to the minority class data and hence is a good measure in the case of imbalanced data. In the proposed implementation, the $$F_1$$ F 1 -score is calculated based on a 1-dimensional decision tree classifier resulting in a fast and effective feature evaluation method. Numerical experiments confirm that the proposed method achieves robust dimensionality reduction and accuracy results. In addition, the low computational complexity of the algorithm makes it a practical choice for big data applications.

Keywords: Imbalanced data; Feature selection; Filter method; $$F_1$$ F 1 -score; Big data; Data mining; Machine learning (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s40745-021-00366-5 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:aodasc:v:10:y:2023:i:6:d:10.1007_s40745-021-00366-5

Ordering information: This journal article can be ordered from
https://www.springer ... gement/journal/40745

DOI: 10.1007/s40745-021-00366-5

Access Statistics for this article

Annals of Data Science is currently edited by Yong Shi

More articles in Annals of Data Science from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:aodasc:v:10:y:2023:i:6:d:10.1007_s40745-021-00366-5