Number of Instances for Reliable Feature Ranking in a Given Problem
Bohanec Marko (),
Borštnar Mirjana Kljajić () and
Robnik-Šikonja Marko ()
Additional contact information
Bohanec Marko: Salvirt Ltd.,Ljubljana, Slovenia
Borštnar Mirjana Kljajić: Faculty of Organizational Sciences, University of Maribor,Kranj, Slovenia
Robnik-Šikonja Marko: Faculty of Computer and Information Science, University of Ljubljana,Ljubljana, Slovenia
Business Systems Research, 2018, vol. 9, issue 2, 35-44
Abstract:
Background: In practical use of machine learning models, users may add new features to an existing classification model, reflecting their (changed) empirical understanding of a field. New features potentially increase classification accuracy of the model or improve its interpretability. Objectives: We have introduced a guideline for determination of the sample size needed to reliably estimate the impact of a new feature. Methods/Approach: Our approach is based on the feature evaluation measure ReliefF and the bootstrap-based estimation of confidence intervals for feature ranks. Results: We test our approach using real world qualitative business-tobusiness sales forecasting data and two UCI data sets, one with missing values. The results show that new features with a high or a low rank can be detected using a relatively small number of instances, but features ranked near the border of useful features need larger samples to determine their impact. Conclusions: A combination of the feature evaluation measure ReliefF and the bootstrap-based estimation of confidence intervals can be used to reliably estimate the impact of a new feature in a given problem
Keywords: machine learning; feature ranking; feature evaluation (search for similar items in EconPapers)
Date: 2018
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://doi.org/10.2478/bsrj-2018-0017 (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bit:bsrysr:v:9:y:2018:i:2:p:35-44:n:4
DOI: 10.2478/bsrj-2018-0017
Access Statistics for this article
Business Systems Research is currently edited by Mirjana Pejić Bach
More articles in Business Systems Research from Sciendo
Bibliographic data for series maintained by Peter Golla ().