An Econometric Perspective on Algorithmic Subsampling
Sokbae (Simon) Lee and
Serena Ng ()
Annual Review of Economics, 2020, vol. 12, issue 1, 45-80
Abstract:
Data sets that are terabytes in size are increasingly common, but computer bottlenecks often frustrate a complete analysis of the data, and diminishing returns suggest that we may not need terabytes of data to estimate a parameter or test a hypothesis. But which rows of data should we analyze, and might an arbitrary subset preserve the features of the original data? We review a line of work grounded in theoretical computer science and numerical linear algebra that finds that an algorithmically desirable sketch, which is a randomly chosen subset of the data, must preserve the eigenstructure of the data, a property known as subspace embedding. Building on this work, we study how prediction and inference can be affected by data sketching within a linear regression setup. We use statistical arguments to provide “inference-conscious” guides to the sketch size and show that an estimator that pools over different sketches can be nearly as efficient as the infeasible one using the full sample.
Date: 2020
References: Add references at CitEc
Citations: View citations in EconPapers (7)
Downloads: (external link)
https://doi.org/10.1146/annurev-economics-022720-114138
Full text downloads are only available to subscribers. Visit the abstract page for more information.
Related works:
Working Paper: An Econometric Perspective on Algorithmic Subsampling (2020) 
Working Paper: An econometric perspective on algorithmic subsampling (2020) 
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:anr:reveco:v:12:y:2020:p:45-80
Ordering information: This journal article can be ordered from
http://www.annualreviews.org/action/ecommerce
DOI: 10.1146/annurev-economics-022720-114138
Access Statistics for this article
More articles in Annual Review of Economics from Annual Reviews Annual Reviews 4139 El Camino Way Palo Alto, CA 94306, USA.
Bibliographic data for series maintained by http://www.annualreviews.org ().