EconPapers    
Economics at your fingertips  
 

Fast Optimal Subsampling Probability Approximation for Generalized Linear Models

JooChul Lee, Elizabeth D. Schifano and HaiYing Wang

Econometrics and Statistics, 2024, vol. 29, issue C, 224-237

Abstract: For massive data, subsampling techniques are popular to mitigate computational burden by reducing the data size. In a subsampling approach, subsampling probabilities for each data point are specified to obtain an informative sub-data, and then estimates based on the sub-data are obtained to approximate estimates from the full data. Assigning subsampling probabilities based on minimization of the asymptotic mean squared error of the estimator from a general subsample (A-optimality criterion) is a popular approach, however, it is still computationally demanding to calculate the probabilities under this setting. To efficiently approximate the A-optimal subsampling probabilities for generalized linear models, randomized algorithms are proposed. To develop the algorithms, the Johnson-Lindenstrauss Transform and Subsampled Randomized Hadamard Transform are used. Additionally, optimal subsampling probabilities are derived for the Gaussian linear model in the case where both the regression coefficients and dispersion parameter are of interest, and algorithms are developed to approximate the optimal subsampling probabilities. Simulation studies indicate that the estimators based on the developed algorithms have excellent performance for statistical inference and have substantial savings in computing time compared to the direct calculation of the A-optimal subsampling probabilities.

Keywords: Generalized linear models; Massive data; Optimal subsampling; Randomized algorithm (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S2452306221000290
Full text for ScienceDirect subscribers only. Contains open access articles

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:eee:ecosta:v:29:y:2024:i:c:p:224-237

DOI: 10.1016/j.ecosta.2021.02.007

Access Statistics for this article

Econometrics and Statistics is currently edited by E.J. Kontoghiorghes, H. Van Dijk and A.M. Colubi

More articles in Econometrics and Statistics from Elsevier
Bibliographic data for series maintained by Catherine Liu ().

 
Page updated 2025-03-19
Handle: RePEc:eee:ecosta:v:29:y:2024:i:c:p:224-237