Sufficiency Revisited: Rethinking Statistical Algorithms in the Big Data Era
Jarod Y. L. Lee,
James J. Brown and
Louise M. Ryan
The American Statistician, 2017, vol. 71, issue 3, 202-208
Abstract:
The big data era demands new statistical analysis paradigms, since traditional methods often break down when datasets are too large to fit on a single desktop computer. Divide and Recombine (D&R) is becoming a popular approach for big data analysis, where results are combined over subanalyses performed in separate data subsets. In this article, we consider situations where unit record data cannot be made available by data custodians due to privacy concerns, and explore the concept of statistical sufficiency and summary statistics for model fitting. The resulting approach represents a type of D&R strategy, which we refer to as summary statistics D&R; as opposed to the standard approach, which we refer to as horizontal D&R. We demonstrate the concept via an extended Gamma–Poisson model, where summary statistics are extracted from different databases and incorporated directly into the fitting algorithm without having to combine unit record data. By exploiting the natural hierarchy of data, our approach has major benefits in terms of privacy protection. Incorporating the proposed modelling framework into data extraction tools such as TableBuilder by the Australian Bureau of Statistics allows for potential analysis at a finer geographical level, which we illustrate with a multilevel analysis of the Australian unemployment data. Supplementary materials for this article are available online.
Date: 2017
References: View complete reference list from CitEc
Citations: View citations in EconPapers (5)
Downloads: (external link)
http://hdl.handle.net/10.1080/00031305.2016.1255659 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:taf:amstat:v:71:y:2017:i:3:p:202-208
Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/UTAS20
DOI: 10.1080/00031305.2016.1255659
Access Statistics for this article
The American Statistician is currently edited by Eric Sampson
More articles in The American Statistician from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().