Feature Screening with Conditional Rank Utility for Big-Data Classification
Xingxiang Li and
Chen Xu
Journal of the American Statistical Association, 2024, vol. 119, issue 546, 1385-1395
Abstract:
Feature screening is a commonly used strategy to eliminate irrelevant features in high-dimensional classification. When one encounters big datasets with both high dimensionality and huge sample size, the conventional screening methods become computationally costly or even infeasible. In this article, we introduce a novel screening utility, Conditional Rank Utility (CRU), and propose a distributed feature screening procedure for the big-data classification. The proposed CRU effectively quantifies the significance of a numerical feature on the categorical response. Since CRU is constructed based on the ratio of the mean conditional rank to the mean unconditional rank of a feature, it is robust against model misspecification and the presence of outliers. Structurally, CRU can be expressed as a simple function of a few component parameters, each of which can be distributively estimated using a natural unbiased estimator from the data segments. Under mild conditions, we show that the distributed estimator of CRU is fully efficient in terms of the probability convergence bound and the mean squared error rate; the corresponding distributed screening procedure enjoys the sure screening and ranking properties. The promising performances of the CRU-based screening are supported by extensive numerical examples. Supplementary materials for this article are available online.
Date: 2024
References: Add references at CitEc
Citations:
Downloads: (external link)
http://hdl.handle.net/10.1080/01621459.2023.2195976 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:taf:jnlasa:v:119:y:2024:i:546:p:1385-1395
Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/UASA20
DOI: 10.1080/01621459.2023.2195976
Access Statistics for this article
Journal of the American Statistical Association is currently edited by Xuming He, Jun Liu, Joseph Ibrahim and Alyson Wilson
More articles in Journal of the American Statistical Association from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().