Optimal subsample selection for massive logistic regression with distributed data
Lulu Zuo,
Haixiang Zhang (),
HaiYing Wang and
Liuquan Sun
Additional contact information
Lulu Zuo: Tianjin University
Haixiang Zhang: Tianjin University
HaiYing Wang: University of Connecticut
Liuquan Sun: Chinese Academy of Sciences
Computational Statistics, 2021, vol. 36, issue 4, No 9, 2535-2562
Abstract:
Abstract With the emergence of big data, it is increasingly common that the data are distributed. i.e., the data are stored at many distributed sites (machines or nodes) owing to data collection or business operations, etc. We propose a distributed subsampling procedure in such a setting to efficiently approximate the maximum likelihood estimator for the logistic regression. We establish the consistency and asymptotic normality of the subsample estimator given the full data. The optimal subsampling probabilities and optimal allocation sizes are explicitly obtained. We develop a two-step algorithm to approximate the optimal subsampling procedure. Numerical simulations and an application to airline data are presented to evaluate the performance of our subsampling method.
Keywords: Allocation size; Big data; Distributed and massive data; Subsample estimator; Subsampling probabilities (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
http://link.springer.com/10.1007/s00180-021-01089-0 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:compst:v:36:y:2021:i:4:d:10.1007_s00180-021-01089-0
Ordering information: This journal article can be ordered from
http://www.springer.com/statistics/journal/180/PS2
DOI: 10.1007/s00180-021-01089-0
Access Statistics for this article
Computational Statistics is currently edited by Wataru Sakamoto, Ricardo Cao and Jürgen Symanzik
More articles in Computational Statistics from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().