Model-free feature screening based on Hellinger distance for ultrahigh dimensional data
Jiujing Wu () and
Hengjian Cui ()
Additional contact information
Jiujing Wu: Capital Normal University
Hengjian Cui: Capital Normal University
Statistical Papers, 2024, vol. 65, issue 9, No 19, 5903-5930
Abstract:
Abstract With the explosive development of data acquisition and processing technology, feature dimensions increase exponentially with sample size, posing significant challenges for data analysis. It is crucial to accurately identify useful features from thousands available. In this paper, we develop an omnibus model-free feature screening procedure based on the Hellinger distance, offering several appealing merits. First, we define the Hellinger distance index for discrete response variables in discriminant analysis. Second, this procedure consistently works for continuous response variables, where the responses are discretized using a slice-and-fused technique. Third, it is robust against potential outliers and model misspecification. Theoretically, the procedure for both discrete and continuous response variables exhibits sure screening and ranking consistency properties under mild conditions. Numerical studies show that this procedure is highly competitive in heavy-tailed and skewed data, as well as maintaining comparability with existing approaches for light-tailed data, indicating robust performance across various data types. The real data sets, one with discrete and the other with continuous response variables demonstrate the effectiveness of the proposed method.
Keywords: Ultrahigh dimensionality; Hellinger distance; Model-free; Sure screening property; 62G99; 62H20; 62H30 (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s00362-024-01615-4 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:stpapr:v:65:y:2024:i:9:d:10.1007_s00362-024-01615-4
Ordering information: This journal article can be ordered from
http://www.springer. ... business/journal/362
DOI: 10.1007/s00362-024-01615-4
Access Statistics for this article
Statistical Papers is currently edited by C. Müller, W. Krämer and W.G. Müller
More articles in Statistical Papers from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().