Facilitating high‐dimensional transparent classification via empirical Bayes variable selection
Haim Bar,
James Booth,
Martin T. Wells and
Kangyan Liu
Applied Stochastic Models in Business and Industry, 2018, vol. 34, issue 6, 949-961
Abstract:
We present a two‐step approach to classification problems in the “large P, small N” setting, where the number of predictors may be larger than the sample size. We assume that the association between the predictors and the class variable has an approximate linear‐logistic form, but we allow the class boundaries to be nonlinear. We further assume that the number of true predictors is relatively small. In the first step, we use a binomial generalized linear model to identify which predictors are associated with each class and then restrict the data set to these predictors and run a nonlinear classifier, such as a random forest or a support vector machine. We show that, without the variable screening step, the classification performance of both the random forest and support vector machine is degraded when many among the P predictors are not related to the class.
Date: 2018
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://doi.org/10.1002/asmb.2393
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:wly:apsmbi:v:34:y:2018:i:6:p:949-961
Access Statistics for this article
More articles in Applied Stochastic Models in Business and Industry from John Wiley & Sons
Bibliographic data for series maintained by Wiley Content Delivery ().