Benchmark for filter methods for feature selection in high-dimensional classification data
Andrea Bommert,
Xudong Sun,
Bernd Bischl,
Jörg Rahnenführer and
Michel Lang
Computational Statistics & Data Analysis, 2020, vol. 143, issue C
Abstract:
Feature selection is one of the most fundamental problems in machine learning and has drawn increasing attention due to high-dimensional data sets emerging from different fields like bioinformatics. For feature selection, filter methods play an important role, since they can be combined with any machine learning model and can heavily reduce run time of machine learning algorithms. The aim of the analyses is to review how different filter methods work, to compare their performance with respect to both run time and predictive accuracy, and to provide guidance for applications. Based on 16 high-dimensional classification data sets, 22 filter methods are analyzed with respect to run time and accuracy when combined with a classification method. It is concluded that there is no group of filter methods that always outperforms all other methods, but recommendations on filter methods that perform well on many of the data sets are made. Also, groups of filters that are similar with respect to the order in which they rank the features are found. For the analyses, the R machine learning package mlr is used. It provides a uniform programming API and therefore is a convenient tool to conduct feature selection using filter methods.
Keywords: Feature selection; Filter methods; High-dimensional data; Benchmark (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (11)
Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S016794731930194X
Full text for ScienceDirect subscribers only.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:eee:csdana:v:143:y:2020:i:c:s016794731930194x
DOI: 10.1016/j.csda.2019.106839
Access Statistics for this article
Computational Statistics & Data Analysis is currently edited by S.P. Azen
More articles in Computational Statistics & Data Analysis from Elsevier
Bibliographic data for series maintained by Catherine Liu ().