Unrepresentative big surveys significantly overestimated US vaccine uptake
Valerie C. Bradley,
Shiro Kuriwaki,
Michael Isakov,
Dino Sejdinovic,
Xiao-Li Meng and
Seth Flaxman ()
Additional contact information
Valerie C. Bradley: University of Oxford
Shiro Kuriwaki: Stanford University
Michael Isakov: Harvard University
Dino Sejdinovic: University of Oxford
Xiao-Li Meng: Harvard University
Seth Flaxman: University of Oxford
Nature, 2021, vol. 600, issue 7890, 695-700
Abstract:
Abstract Surveys are a crucial tool for understanding public opinion and behaviour, and their accuracy depends on maintaining statistical representativeness of their target populations by minimizing biases from all sources. Increasing data size shrinks confidence intervals but magnifies the effect of survey bias: an instance of the Big Data Paradox1. Here we demonstrate this paradox in estimates of first-dose COVID-19 vaccine uptake in US adults from 9 January to 19 May 2021 from two large surveys: Delphi–Facebook2,3 (about 250,000 responses per week) and Census Household Pulse4 (about 75,000 every two weeks). In May 2021, Delphi–Facebook overestimated uptake by 17 percentage points (14–20 percentage points with 5% benchmark imprecision) and Census Household Pulse by 14 (11–17 percentage points with 5% benchmark imprecision), compared to a retroactively updated benchmark the Centers for Disease Control and Prevention published on 26 May 2021. Moreover, their large sample sizes led to miniscule margins of error on the incorrect estimates. By contrast, an Axios–Ipsos online panel5 with about 1,000 responses per week following survey research best practices6 provided reliable estimates and uncertainty quantification. We decompose observed error using a recent analytic framework1 to explain the inaccuracy in the three surveys. We then analyse the implications for vaccine hesitancy and willingness. We show how a survey of 250,000 respondents can produce an estimate of the population mean that is no more accurate than an estimate from a simple random sample of size 10. Our central message is that data quality matters more than data quantity, and that compensating the former with the latter is a mathematically provable losing proposition.
Date: 2021
References: Add references at CitEc
Citations: View citations in EconPapers (11)
Downloads: (external link)
https://www.nature.com/articles/s41586-021-04198-4 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:nat:nature:v:600:y:2021:i:7890:d:10.1038_s41586-021-04198-4
Ordering information: This journal article can be ordered from
https://www.nature.com/
DOI: 10.1038/s41586-021-04198-4
Access Statistics for this article
Nature is currently edited by Magdalena Skipper
More articles in Nature from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().