Abstract:
One approach to limiting disclosure risk in public-use microdata is to release multiply-imputed, partially synthetic data sets. These are data on actual respondents, but with con dential data replaced by multiply-imputed synthetic values. When imputing confidential values, a mis-specified model can invalidate inferences, because the distribution of synthetic data is determined by the model used to generate them. We present a practical method to generate synthetic values when the imputer has only limited information about the true data generating process. We combine a simple imputation model (such as regression) with a series of density-based transformations to pre- serve the distribution of the con dential data, up to sampling error, on speci ed subdomains. We demonstrate through simulation and a large scale application that our approach preserves important statistical properties of the con dential data, including higher moments, with low disclosure risk.
Ordering information: This working paper can be ordered from Working Paper Coordinator, Department of Economics, Simon Fraser University, 8888 University Drive, Burnaby, BC, V5A 1S6, Canada http://www.sfu.ca/ec ... ch/publications.html
More papers in Discussion Papers from Department of Economics, Simon Fraser University Address: Department of Economics, Simon Fraser University, 8888 University Drive, Burnaby, BC, V5A 1S6, Canada Contact information at EDIRC. Series data maintained by Working Paper Coordinator ().