EconPapers    
Economics at your fingertips  
 

Class-Restricted Clustering and Microperturbation for Data Privacy

Xiao-Bai Li () and Sumit Sarkar ()
Additional contact information
Xiao-Bai Li: Department of Operations and Information Systems, University of Massachusetts Lowell, Lowell, Massachusetts 01854
Sumit Sarkar: School of Management, University of Texas at Dallas, Richardson, Texas 75080

Management Science, 2013, vol. 59, issue 4, 796-812

Abstract: The extensive use of information technologies by organizations to collect and share personal data has raised strong privacy concerns. To respond to the public's demand for data privacy, a class of clustering-based data masking techniques is increasingly being used for privacy-preserving data sharing and analytics. Although they address reidentification risks, traditional clustering-based approaches for masking numeric attributes typically do not consider the disclosure risk of categorical confidential attributes. We propose a new approach to deal with this problem. The proposed method clusters data such that the data points within a group are similar in the nonconfidential attribute values, whereas the confidential attribute values within a group are well distributed . To accomplish this, the clustering method, which is based on a minimum spanning tree (MST) technique, uses two risk-utility trade-off measures in the growing and pruning stages of the MST technique, respectively. As part of our approach we also propose a novel cluster-level microperturbation method for masking data that overcomes a common problem of traditional clustering-based methods for data masking, which is their inability to preserve important statistical properties such as the variance of attributes and the covariance across attributes. We show that the mean vector and the covariance matrix of the masked data generated using the microperturbation method are unbiased estimates of the original mean vector and covariance matrix. An experimental study on several real-world data sets demonstrates the effectiveness of the proposed approach. This paper was accepted by Sandra Slaughter, information systems.

Keywords: confidentiality; minimum spanning tree; microaggregation; data perturbation; information theory (search for similar items in EconPapers)
Date: 2013
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (6)

Downloads: (external link)
http://dx.doi.org/10.1287/mnsc.1120.1584 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:ormnsc:v:59:y:2013:i:4:p:796-812

Access Statistics for this article

More articles in Management Science from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().

 
Page updated 2025-03-19
Handle: RePEc:inm:ormnsc:v:59:y:2013:i:4:p:796-812