Optimal subsampling for multiplicative regression with massive data

Wang, Tianzhen; Zhang, Haixiang

Optimal subsampling for multiplicative regression with massive data

Tianzhen Wang and Haixiang Zhang

Statistica Neerlandica, 2022, vol. 76, issue 4, 418-449

Abstract: Faced with massive data, subsampling is a popular way to downsize the data volume for reducing computational burden. The key idea of subsampling is to perform statistical analysis on a representative subsample drawn from the full data. It provides a practical solution to extracting useful information from big data. In this article, we develop an efficient subsampling method for large‐scale multiplicative regression model, which can largely reduce the computational burden due to massive data. Under some regularity conditions, we establish consistency and asymptotic normality of the subsample‐based estimator, and derive the optimal subsampling probabilities according to the L‐optimality criterion. A two‐step algorithm is developed to approximate the optimal subsampling procedure. Meanwhile, the convergence rate and asymptotic normality of the two‐step subsample estimator are established. Numerical studies and two real data applications are carried out to evaluate the performance of our subsampling method.

Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://doi.org/10.1111/stan.12266

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bla:stanee:v:76:y:2022:i:4:p:418-449

Ordering information: This journal article can be ordered from
http://www.blackwell ... bs.asp?ref=0039-0402

Access Statistics for this article

Statistica Neerlandica is currently edited by Miroslav Ristic, Marijtje van Duijn and Nan van Geloven

More articles in Statistica Neerlandica from Netherlands Society for Statistics and Operations Research
Bibliographic data for series maintained by Wiley Content Delivery ().