Feature Screening for Massive Data Analysis by Subsampling
Xuening Zhu,
Rui Pan,
Shuyuan Wu and
Hansheng Wang
Journal of Business & Economic Statistics, 2022, vol. 40, issue 4, 1892-1903
Abstract:
Modern statistical analysis often encounters massive datasets with ultrahigh-dimensional features. In this work, we develop a subsampling approach for feature screening with massive datasets. The approach is implemented by repeated subsampling of massive data and can be used for analyzing tasks with memory constraints. To conduct the procedure, we first calculate an R-squared screening measure (and related sample moments) based on subsamples. Second, we consider three methods to combine the local statistics. In addition to the simple average method, we design a jackknife debiased screening measure and an aggregated moment screening measure. Both approaches reduce the bias of the subsampling screening measure and therefore increase the accuracy of the feature screening. Last, we consider a novel sequential sampling method, that is more computationally efficient than the traditional random sampling method. The theoretical properties of the three screening measures under both sampling schemes are rigorously discussed. Finally, we illustrate the usefulness of the proposed method with an airline dataset containing 32.7 million records.
Date: 2022
References: Add references at CitEc
Citations:
Downloads: (external link)
http://hdl.handle.net/10.1080/07350015.2021.1990771 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:taf:jnlbes:v:40:y:2022:i:4:p:1892-1903
Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/UBES20
DOI: 10.1080/07350015.2021.1990771
Access Statistics for this article
Journal of Business & Economic Statistics is currently edited by Eric Sampson, Rong Chen and Shakeeb Khan
More articles in Journal of Business & Economic Statistics from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().