Nonparametric variable selection and classification: The CATCH algorithm
Shijie Tang,
Lisha Chen,
Kam-Wah Tsui and
Kjell Doksum
Computational Statistics & Data Analysis, 2014, vol. 72, issue C, 158-175
Abstract:
The problem of classifying a categorical response Y is considered in a nonparametric framework. The distribution of Y depends on a vector of predictors X, where the coordinates Xj of X may be continuous, discrete, or categorical. An algorithm is constructed to select the variables to be used for classification. For each variable Xj, an importance score sj is computed to measure the strength of association of Xj with Y. The algorithm deletes Xj if sj falls below a certain threshold. It is shown in Monte Carlo simulations that the algorithm has a high probability of only selecting variables associated with Y. Moreover when this variable selection rule is used for dimension reduction prior to applying classification procedures, it improves the performance of these procedures. The approach for computing importance scores is based on root Chi-square type statistics computed for randomly selected regions (tubes) of the sample space. The size and shape of the regions are adjusted iteratively and adaptively using the data to enhance the ability of the importance score to detect local relationships between the response and the predictors. These local scores are then averaged over the tubes to form a global importance score sj for variable Xj. When confounding and spurious associations are issues, the nonparametric importance score for variable Xj is computed conditionally by using tubes to restrict the other variables. This variable selection procedure is called CATCH (Categorical Adaptive Tube Covariate Hunting). Asymptotic properties, including consistency, are established.
Keywords: Adaptive variable selection; Importance score; Chi-square statistic (search for similar items in EconPapers)
Date: 2014
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0167947313003782
Full text for ScienceDirect subscribers only.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:eee:csdana:v:72:y:2014:i:c:p:158-175
DOI: 10.1016/j.csda.2013.10.024
Access Statistics for this article
Computational Statistics & Data Analysis is currently edited by S.P. Azen
More articles in Computational Statistics & Data Analysis from Elsevier
Bibliographic data for series maintained by Catherine Liu ().