Wasserstein filter for variable screening in binary classification in the reproducing kernel Hilbert space
Sanghun Jeong,
Choongrak Kim and
Hojin Yang
Journal of Nonparametric Statistics, 2024, vol. 36, issue 3, 623-642
Abstract:
The aim of this paper is to develop a marginal screening method for variable screening in high-dimensional binary classification based on the Wasserstein distance accounting for the distributional difference. Many existing screening methods, such as the two-sample t-test and Kolmogorov test, have been developed under the parametric/nonparametric modeling assumptions to reduce the dimension of the predictors. However, such modeling specifications or nonparametric approaches are associated with the probability measure induced by the predictor in a Euclidean space. While many machine learning methods have successfully found the nonlinear decision boundary in the transformed space, called the reproducing kernel Hilbert space (RKHS), we consider the Wasserstein filter's capacity to detect the distributional difference between two probability measures induced by the nonlinear function of the predictor in the RKHS. Thereby, we can flexibly filter out the non-informative predictors associated with the binary classification, as well as escape the modeling assumptions required in a Euclidean space. We prove that the Wasserstein filter satisfies the sure screening property under some mild conditions. We also demonstrate the advantages of our proposed approach by comparing the finite sample performance of it with those of the existing choices through simulation studies, as well as through application to lung cancer data.
Date: 2024
References: Add references at CitEc
Citations:
Downloads: (external link)
http://hdl.handle.net/10.1080/10485252.2023.2235430 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:taf:gnstxx:v:36:y:2024:i:3:p:623-642
Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/GNST20
DOI: 10.1080/10485252.2023.2235430
Access Statistics for this article
Journal of Nonparametric Statistics is currently edited by Jun Shao
More articles in Journal of Nonparametric Statistics from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().