Abstract:
One approach to limiting disclosure risk in public-use microdata is to release multiply-imputed, partially synthetic data sets. These are data on actual respondents, but with con dential data replaced by multiply-imputed synthetic values. When imputing confidential values, a mis-specified model can invalidate inferences, because the distribution of synthetic data is determined by the model used to generate them. We present a practical method to generate synthetic values when the imputer has only limited information about the true data generating process. We combine a simple imputation model (such as regression) with a series of density-based transformations to pre- serve the distribution of the con dential data, up to sampling error, on speci ed subdomains. We demonstrate through simulation and a large scale application that our approach preserves important statistical properties of the con dential data, including higher moments, with low disclosure risk.
Ordering information: This working paper can be ordered from Working Paper Coordinator, Department of Economics, Simon Fraser University, 8888 University Drive, Burnaby, BC, V5A 1S6, Canada http://www.econ.sfu. ... lications/index.html
More papers in Discussion Papers from Department of Economics, Simon Fraser University Address: Department of Economics, Simon Fraser University, 8888 University Drive, Burnaby, BC, V5A 1S6, Canada Contact information at EDIRC. Series data maintained by Working Paper Coordinator ().
This site is part of RePEc
and all the data displayed here is part of the RePEc data set.
Is your work missing from RePEc? Here is how to
contribute.
Questions or problems? Check the EconPapers FAQ or send mail to .