EconPapers    
Economics at your fingertips  
 

Distribution-preserving statistical disclosure limitation

Simon Woodcock and Gary Benedetto

Computational Statistics & Data Analysis, 2009, vol. 53, issue 12, 4228-4242

Abstract: One approach to limiting disclosure risk in public-use microdata is to release multiply-imputed, partially synthetic data sets. These are data on actual respondents, but with confidential data replaced by multiply-imputed synthetic values. A mis-specified imputation model can invalidate inferences based on the partially synthetic data, because the imputation model determines the distribution of synthetic values. We present a practical method to generate synthetic values when the imputer has only limited information about the true data generating process. We combine a simple imputation model (such as regression) with density-based transformations that preserve the distribution of the confidential data, up to sampling error, on specified subdomains. We demonstrate through simulations and a large scale application that our approach preserves important statistical properties of the confidential data, including higher moments, with low disclosure risk.

Date: 2009
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (3)

Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0167-9473(09)00201-1
Full text for ScienceDirect subscribers only.

Related works:
Working Paper: Distribution-Preserving Statistical Disclosure Limitation (2007) Downloads
Working Paper: Distribution Preserving Statistical Disclosure Limitation (2006) Downloads
Working Paper: Distribution-Preserving Statistical Disclosure Limitation (2006) Downloads
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:eee:csdana:v:53:y:2009:i:12:p:4228-4242

Access Statistics for this article

Computational Statistics & Data Analysis is currently edited by S.P. Azen

More articles in Computational Statistics & Data Analysis from Elsevier
Bibliographic data for series maintained by Catherine Liu ().

 
Page updated 2025-03-19
Handle: RePEc:eee:csdana:v:53:y:2009:i:12:p:4228-4242