EconPapers    
Economics at your fingertips  
 

A Feature Selection Method Based on Ranked Vector Scores of Features for Classification

Firuz Kamalov () and Fadi Thabtah ()
Additional contact information
Firuz Kamalov: Canadian University of Dubai
Fadi Thabtah: University of Huddersfield

Annals of Data Science, 2017, vol. 4, issue 4, No 4, 483-502

Abstract: Abstract One of the major aspects of any classification process is selecting the relevant set of features to be used in a classification algorithm. This initial step in data analysis is called the feature selection process. Disposing of the irrelevant features from the dataset will reduce the complexity of the classification task and will increase the robustness of the decision rules when applied on the test set. This paper proposes a new filtering method that combines and normalizes the scores of three major feature selection methods: information gain, chi-squared statistic and inter-correlation. Our method utilizes the strengths of each of the aforementioned methods to maximum advantage while avoiding their drawbacks—especially the disparity of the results produced by these methods. Our filtering method stabilizes each variable score and gives it the true rank among the input data’s available variables. Hence it maximizes the stability in the variables’ scores without losing the overall accuracy of the predictive model. A number of experiments on different datasets from various domains have shown that features chosen by the proposed method are highly predictive when compared with features selected by other existing filtering methods. The evaluation of the filtering phase was conducted via thorough experimentations using a number of predictive classification algorithms in addition to statistical analysis of the filtering methods’ scores.

Keywords: Classification accuracy; Data mining; Dimensionality reduction; Feature selection; Predictive models; Ranking of features (search for similar items in EconPapers)
Date: 2017
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (5)

Downloads: (external link)
http://link.springer.com/10.1007/s40745-017-0116-1 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:aodasc:v:4:y:2017:i:4:d:10.1007_s40745-017-0116-1

Ordering information: This journal article can be ordered from
https://www.springer ... gement/journal/40745

DOI: 10.1007/s40745-017-0116-1

Access Statistics for this article

Annals of Data Science is currently edited by Yong Shi

More articles in Annals of Data Science from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:aodasc:v:4:y:2017:i:4:d:10.1007_s40745-017-0116-1