A Simulated Reconstruction and Reidentification Attack on the 2010 U.S. Census: Full Technical Report
John Abowd (),
Tamara Adams,
Robert Ashmead,
David Darais,
Sourya Dey,
Simson L. Garfinkel,
Nathan Goldschlag,
Daniel Kifer,
Philip Leclerc,
Ethan Lew,
Scott Moore,
Rolando A. Rodríguez,
Ramy N. Tadros and
Lars Vilhuber
No 31995, NBER Working Papers from National Bureau of Economic Research, Inc
Abstract:
For the last half-century, it has been a common and accepted practice for statistical agencies, including the United States Census Bureau, to adopt different strategies to protect the confidentiality of aggregate tabular data products from those used to protect the individual records contained in publicly released microdata products. This strategy was premised on the assumption that the aggregation used to generate tabular data products made the resulting statistics inherently less disclosive than the microdata from which they were tabulated. Consistent with this common assumption, the 2010 Census of Population and Housing in the U.S. used different disclosure limitation rules for its tabular and microdata publications. This paper demonstrates that, in the context of disclosure limitation for the 2010 Census, the assumption that tabular data are inherently less disclosive than their underlying microdata is fundamentally flawed. The 2010 Census published more than 150 billion aggregate statistics in 180 table sets. Most of these tables were published at the most detailed geographic level—individual census blocks, which can have populations as small as one person. Using only 34 of the published table sets, we reconstructed microdata records including five variables (census block, sex, age, race, and ethnicity) from the confidential 2010 Census person records. Using only published data, an attacker using our methods can verify that all records in 70% of all census blocks (97 million people) are perfectly reconstructed. We further confirm, through reidentification studies, that an attacker can, within census blocks with perfect reconstruction accuracy, correctly infer the actual census response on race and ethnicity for 3.4 million vulnerable population uniques (persons with race and ethnicity different from the modal person on the census block) with 95% accuracy. Having shown the vulnerabilities inherent to the disclosure limitation methods used for the 2010 Census, we proceed to demonstrate that the more robust disclosure limitation framework used for the 2020 Census publications defends against attacks that are based on reconstruction. Finally, we show that available alternatives to the 2020 Census Disclosure Avoidance System would either fail to protect confidentiality, or would overly degrade the statistics’ utility for the primary statutory use case: redrawing the boundaries of all of the nation’s legislative and voting districts in compliance with the 1965 Voting Rights Act. You are reading the full technical report. For the summary paper see https://doi.org/10.1162/99608f92.4a1ebf70.
JEL-codes: C19 J10 (search for similar items in EconPapers)
Date: 2023-12
Note: LS
References: Add references at CitEc
Citations:
Downloads: (external link)
http://www.nber.org/papers/w31995.pdf (application/pdf)
Related works:
Working Paper: A Simulated Reconstruction and Reidentification Attack on the 2010 U.S. Census: Full Technical Report (2025) 
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:nbr:nberwo:31995
Ordering information: This working paper can be ordered from
http://www.nber.org/papers/w31995
Access Statistics for this paper
More papers in NBER Working Papers from National Bureau of Economic Research, Inc National Bureau of Economic Research, 1050 Massachusetts Avenue Cambridge, MA 02138, U.S.A.. Contact information at EDIRC.
Bibliographic data for series maintained by ().