Distributed one-step upgraded estimation for non-uniformly and non-randomly distributed data
Feifei Wang,
Yingqiu Zhu,
Danyang Huang,
Haobo Qi and
Hansheng Wang
Computational Statistics & Data Analysis, 2021, vol. 162, issue C
Abstract:
One-shot-type (or divide-and-conquer) estimators have been widely used for distributed statistical analysis. However, their outstanding statistical efficiency hinges on two critical conditions. The first is the uniformity condition, which requires that the sample sizes allocated to different Workers should be as comparable as possible. The second one is the randomness condition, which requires that the data should be distributed across Workers as randomly as possible. Both conditions are often violated in practice. The violation of either condition can be seriously degrade the statistical efficiency of one-shot estimators, or even make them inconsistent. To fix this problem, a novel one-step upgraded pilot (OSUP) method is proposed. In the first step of the algorithm, a pilot estimate is computed based on randomly selected samples from different Workers. In the second step, one-step updating is conducted based on the pilot estimate by summarizing the derivative information on each Worker. The resulting OSUP estimator is theoretically proved to be as statistically efficient as the whole sample maximum likelihood estimator without any restrictive assumption about distribution uniformity and randomness. Extensive numerical studies are presented to demonstrate the finite sample performance of the OSUP estimator. Finally, by way of an illustration, an American Airlines dataset is analyzed on a Spark cluster.
Keywords: Distributed system; Non-uniformity; Non-randomness; One-shot estimator; One-step estimator (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0167947321000992
Full text for ScienceDirect subscribers only.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:eee:csdana:v:162:y:2021:i:c:s0167947321000992
DOI: 10.1016/j.csda.2021.107265
Access Statistics for this article
Computational Statistics & Data Analysis is currently edited by S.P. Azen
More articles in Computational Statistics & Data Analysis from Elsevier
Bibliographic data for series maintained by Catherine Liu ().