Modifying Transactional Databases to Hide Sensitive Association Rules

Menon, Syam; Ghoshal, Abhijeet; Sarkar, Sumit

Modifying Transactional Databases to Hide Sensitive Association Rules

Syam Menon (), Abhijeet Ghoshal () and Sumit Sarkar ()
Additional contact information
Syam Menon: Jindal School of Management, University of Texas at Dallas, Richardson, Texas 75080
Abhijeet Ghoshal: Gies College of Business, University of Illinois Urbana–Champaign, Champaign, Illinois 61820
Sumit Sarkar: Jindal School of Management, University of Texas at Dallas, Richardson, Texas 75080

Information Systems Research, 2022, vol. 33, issue 1, 152-178

Abstract: Firms have been sharing transactional data with business partners ever since electronic data interchange was introduced to the retail industry in the 1980s. The potential benefits of data sharing notwithstanding, there has been continued reluctance on the part of data owners to share their data, for fear of sensitive information potentially making its way to competitors. Approaches that can help hide sensitive information could alleviate such concerns and increase the number of firms that are willing to share. Sensitive information in transactional databases often manifests itself in the form of association rules. Association rules can be concealed by altering transactions such that these sensitive rules stay hidden when the data are mined. The problem of hiding sensitive association rules is NP-hard, and to date, it has only been addressed via heuristic approaches. In this paper, we introduce a nonlinear integer formulation to hide sensitive association rules while maximizing the accuracy of the altered database. We then separate it into two problems: the sanitization problem , which hides sensitive association rules from a specific transaction while altering the transaction as minimally as possible, and the accuracy maximization problem , which maximizes the accuracy of the altered database, given a solution to the sanitization problem. We show how the sanitization problem can be represented as an integer program and propose a heuristic based on intuition from this formulation to solve it. Next, we formulate the accuracy maximization problem as a nonlinear integer program, show how it can be linearized, and derive various results that help reduce the size of the problem to be solved. Computational experiments are conducted on real and synthetic data sets, the largest of which has 100 million transactions. Our results show that although the nonlinear integer formulations are not practical, the linearizations and problem-reduction steps make a significant impact on solvability and solution time. We also find that there are substantial gains to be realized vis-à-vis existing approaches in terms of both solution quality and solution time and that hiding the rules directly (rather than by hiding the associated itemsets) can result in significantly fewer transactions being sanitized.

Keywords: database sharing; maximizing accuracy; association rule hiding; data quality (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://dx.doi.org/10.1287/isre.2021.1033 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:orisre:v:33:y:2022:i:1:p:152-178

Access Statistics for this article

More articles in Information Systems Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().