Fast Optimal Subsampling Probability Approximation for Generalized Linear Models
JooChul Lee,
Elizabeth D. Schifano and
HaiYing Wang
Econometrics and Statistics, 2024, vol. 29, issue C, 224-237
Abstract:
For massive data, subsampling techniques are popular to mitigate computational burden by reducing the data size. In a subsampling approach, subsampling probabilities for each data point are specified to obtain an informative sub-data, and then estimates based on the sub-data are obtained to approximate estimates from the full data. Assigning subsampling probabilities based on minimization of the asymptotic mean squared error of the estimator from a general subsample (A-optimality criterion) is a popular approach, however, it is still computationally demanding to calculate the probabilities under this setting. To efficiently approximate the A-optimal subsampling probabilities for generalized linear models, randomized algorithms are proposed. To develop the algorithms, the Johnson-Lindenstrauss Transform and Subsampled Randomized Hadamard Transform are used. Additionally, optimal subsampling probabilities are derived for the Gaussian linear model in the case where both the regression coefficients and dispersion parameter are of interest, and algorithms are developed to approximate the optimal subsampling probabilities. Simulation studies indicate that the estimators based on the developed algorithms have excellent performance for statistical inference and have substantial savings in computing time compared to the direct calculation of the A-optimal subsampling probabilities.
Keywords: Generalized linear models; Massive data; Optimal subsampling; Randomized algorithm (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S2452306221000290
Full text for ScienceDirect subscribers only. Contains open access articles
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:eee:ecosta:v:29:y:2024:i:c:p:224-237
DOI: 10.1016/j.ecosta.2021.02.007
Access Statistics for this article
Econometrics and Statistics is currently edited by E.J. Kontoghiorghes, H. Van Dijk and A.M. Colubi
More articles in Econometrics and Statistics from Elsevier
Bibliographic data for series maintained by Catherine Liu ().