Against Classification Attacks: A Decision Tree Pruning Approach to Privacy Protection in Data Mining
Xiao-Bai Li () and
Sumit Sarkar ()
Additional contact information
Xiao-Bai Li: Department of Operations and Information Systems, University of Massachusetts Lowell, Lowell, Massachusetts 01854
Sumit Sarkar: School of Management, The University of Texas at Dallas, Richardson, Texas 75080
Operations Research, 2009, vol. 57, issue 6, 1496-1509
Abstract:
Data-mining techniques can be used not only to study collective behavior about customers, but also to discover private information about individuals. In this study, we demonstrate that decision trees, a popular classification technique for data mining, can be used to effectively reveal individuals' confidential data, even when the identities of the individuals are not present in the data. We propose a novel approach for organizations to protect confidential data from such a classification attack. The key components of this approach include a set of entropy-based measures to evaluate disclosure risks of individual records, an optimal pruning algorithm to identify high-risk records, and a pair of data-swapping procedures to reduce the disclosure risks. The proposed method provides the best trade-off between data utility and privacy protection against classification attacks. It can be applied to data with both numeric and categorical attributes. An experimental study on six real-world data sets shows that the proposed method is very effective in protecting privacy while enabling legitimate data mining and analysis.
Keywords: computers; databases/artificial intelligence; data mining; decision trees; pruning; public sector; society; privacy; probability; entropy; relative entropy (search for similar items in EconPapers)
Date: 2009
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
http://dx.doi.org/10.1287/opre.1090.0702 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:inm:oropre:v:57:y:2009:i:6:p:1496-1509
Access Statistics for this article
More articles in Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().