Facilitating high‐dimensional transparent classification via empirical Bayes variable selection

Bar, Haim; Booth, James; Wells, Martin T.; Liu, Kangyan

Facilitating high‐dimensional transparent classification via empirical Bayes variable selection

Haim Bar, James Booth, Martin T. Wells and Kangyan Liu

Applied Stochastic Models in Business and Industry, 2018, vol. 34, issue 6, 949-961

Abstract: We present a two‐step approach to classification problems in the “large P, small N” setting, where the number of predictors may be larger than the sample size. We assume that the association between the predictors and the class variable has an approximate linear‐logistic form, but we allow the class boundaries to be nonlinear. We further assume that the number of true predictors is relatively small. In the first step, we use a binomial generalized linear model to identify which predictors are associated with each class and then restrict the data set to these predictors and run a nonlinear classifier, such as a random forest or a support vector machine. We show that, without the variable screening step, the classification performance of both the random forest and support vector machine is degraded when many among the P predictors are not related to the class.

Date: 2018
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://doi.org/10.1002/asmb.2393

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:wly:apsmbi:v:34:y:2018:i:6:p:949-961

Access Statistics for this article

More articles in Applied Stochastic Models in Business and Industry from John Wiley & Sons
Bibliographic data for series maintained by Wiley Content Delivery ().