Bayesian non‐parametric generation of fully synthetic multivariate categorical data in the presence of structural zeros
Daniel Manrique‐Vallier and
Jingchen Hu
Journal of the Royal Statistical Society Series A, 2018, vol. 181, issue 3, 635-647
Abstract:
Statistical agencies are increasingly adopting synthetic data methods for disseminating microdata without compromising the privacy of respondents. Crucial to the implementation of these approaches are flexible models, able to capture the nuances of the multivariate structure in the original data. In the case of multivariate categorical data, preserving this multivariate structure also often involves satisfying constraints in the form of combinations of responses that cannot logically be present in any data set—like married toddlers or pregnant men—also known as structural zeros. Ignoring structural zeros can result in both logically inconsistent synthetic data and biased estimates. Here we propose the use of a Bayesian non‐parametric method for generating discrete multivariate synthetic data subject to structural zeros. This method can preserve complex multivariate relationships between variables, can be applied to high dimensional data sets with massive collections of structural zeros, requires minimal tuning from the user and is computationally efficient. We demonstrate our approach by synthesizing an extract of 17 variables from the 2000 US census. Our method produces synthetic samples with high analytic utility and low disclosure risk.
Date: 2018
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)
Downloads: (external link)
https://doi.org/10.1111/rssa.12352
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bla:jorssa:v:181:y:2018:i:3:p:635-647
Ordering information: This journal article can be ordered from
http://ordering.onli ... 1111/(ISSN)1467-985X
Access Statistics for this article
Journal of the Royal Statistical Society Series A is currently edited by A. Chevalier and L. Sharples
More articles in Journal of the Royal Statistical Society Series A from Royal Statistical Society Contact information at EDIRC.
Bibliographic data for series maintained by Wiley Content Delivery ().