Privacy Protection in Data Mining: A Perturbation Approach for Categorical Data
Xiao-Bai Li () and
Sumit Sarkar ()
Additional contact information
Xiao-Bai Li: College of Management, University of Massachusetts Lowell, Lowell, Massachusetts 01854
Sumit Sarkar: School of Management, University of Texas at Dallas, Richardson, Texas 75080
Information Systems Research, 2006, vol. 17, issue 3, 254-270
Abstract:
To respond to growing concerns about privacy of personal information, organizations that use their customers' records in data-mining activities are forced to take actions to protect the privacy of the individuals involved. A common practice for many organizations today is to remove identity-related attributes from the customer records before releasing them to data miners or analysts. We investigate the effect of this practice and demonstrate that many records in a data set could be uniquely identified even after identity-related attributes are removed. We propose a perturbation method for categorical data that can be used by organizations to prevent or limit disclosure of confidential data for identifiable records when the data are provided to analysts for classification, a common data-mining task. The proposed method attempts to preserve the statistical properties of the data based on privacy protection parameters specified by the organization. We show that the problem can be solved in two phases, with a linear programming formulation in Phase I (to preserve the first-order marginal distribution), followed by a simple Bayes-based swapping procedure in Phase II (to preserve the joint distribution). Experiments conducted on several real-world data sets demonstrate the effectiveness of the proposed method.
Keywords: privacy; data confidentiality; data mining; linear programming; Bayesian estimation; data swapping (search for similar items in EconPapers)
Date: 2006
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (10)
Downloads: (external link)
http://dx.doi.org/10.1287/isre.1060.0095 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:inm:orisre:v:17:y:2006:i:3:p:254-270
Access Statistics for this article
More articles in Information Systems Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().