Balancing Inferential Integrity and Disclosure Risk Via Model Targeted Masking and Multiple Imputation
Bei Jiang,
Adrian E. Raftery,
Russell J. Steele and
Naisyin Wang
Journal of the American Statistical Association, 2021, vol. 117, issue 537, 52-66
Abstract:
There is a growing expectation that data collected by government-funded studies should be openly available to ensure research reproducibility, which also increases concerns about data privacy. A strategy to protect individuals’ identity is to release multiply imputed (MI) synthetic datasets with masked sensitivity values. However, information loss or incorrectly specified imputation models can weaken or invalidate the inferences obtained from the MI-datasets. We propose a new masking framework with a data-augmentation (DA) component and a tuning mechanism that balances protecting identity disclosure against preserving data utility. Applying it to a restricted-use Canadian Scleroderma Research Group (CSRG) dataset, we found that this DA-MI strategy achieved a 0% identity disclosure risk and preserved all inferential conclusions. It yielded 95% confidence intervals (CIs) that had overlaps of 98.5% (95.5%) on average with the CIs constructed using the full, unmasked CSRG dataset in a work-disability (interstitial lung disease) study. The CI-overlaps were lower for several other methods considered, ranging from 73.9% to 91.9% on average with the lowest value being 28.1%; such low CI-overlaps further led to some incorrect inferential conclusions. These findings indicate that the DA-MI masking framework facilitates sharing of useful research data while protecting participants’ identities. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.
Date: 2021
References: Add references at CitEc
Citations:
Downloads: (external link)
http://hdl.handle.net/10.1080/01621459.2021.1909597 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:taf:jnlasa:v:117:y:2021:i:537:p:52-66
Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/UASA20
DOI: 10.1080/01621459.2021.1909597
Access Statistics for this article
Journal of the American Statistical Association is currently edited by Xuming He, Jun Liu, Joseph Ibrahim and Alyson Wilson
More articles in Journal of the American Statistical Association from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().