EconPapers    
Economics at your fingertips  
 

Minimizing Information Loss and Preserving Privacy

Syam Menon () and Sumit Sarkar ()
Additional contact information
Syam Menon: School of Management, The University of Texas at Dallas, Richardson, Texas 75083
Sumit Sarkar: School of Management, The University of Texas at Dallas, Richardson, Texas 75083

Management Science, 2007, vol. 53, issue 1, 101-116

Abstract: The need to hide sensitive information before sharing databases has long been recognized. In the context of data mining, sensitive information often takes the form of itemsets that need to be suppressed before the data is released. This paper considers the problem of minimizing the number of nonsensitive itemsets lost while concealing sensitive ones. It is shown to be an intractably large version of an NP-hard problem. Consequently, a two-phased procedure that involves the solution of two smaller NP-hard problems is proposed as a practical and effective alternative. In the first phase, a procedure to solve a sanitization problem identifies how the support for sensitive itemsets could be eliminated from a specific transaction by removing the fewest number of items from it. This leads to a modified frequent itemset hiding problem, where transactions to be sanitized are selected such that the number of nonsensitive itemsets lost, while concealing sensitive ones, is minimized. Heuristic procedures are developed for these problems using intuition derived from their integer programming formulations. Results from computational experiments conducted on a publicly available retail data set and three large data sets generated using IBM's synthetic data generator indicate that these approaches are very effective, solving problems involving up to 10 million transactions in a short period of time. The results also show that the process of sanitization has considerable bearing on the quality of solutions obtained.

Keywords: sensitive itemsets; hiding patterns; itemset mining; sanitization (search for similar items in EconPapers)
Date: 2007
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (9)

Downloads: (external link)
http://dx.doi.org/10.1287/mnsc.1060.0603 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:ormnsc:v:53:y:2007:i:1:p:101-116

Access Statistics for this article

More articles in Management Science from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().

 
Page updated 2025-03-19
Handle: RePEc:inm:ormnsc:v:53:y:2007:i:1:p:101-116