Reidentification Risk in Panel Data: Protecting for k -Anonymity
Shaobo Li (),
Matthew J. Schneider (),
Yan Yu () and
Sachin Gupta ()
Additional contact information
Shaobo Li: School of Business, University of Kansas, Lawrence, Kansas 66045
Matthew J. Schneider: LeBow College of Business, Drexel University, Philadelphia, Pennsylvania 19104
Yan Yu: Carl H. Lindner College of Business, University of Cincinnati, Cincinnati, Ohio 45221
Sachin Gupta: SC Johnson College of Business, Cornell University, Ithaca, New York 14853
Information Systems Research, 2023, vol. 34, issue 3, 1066-1088
Abstract:
We consider the risk of reidentification of panelists in marketing research data that are widely used to obtain insights into buyer behavior and to develop marketing strategy. We find that 17%–94% of the panelists in 15 frequently bought consumer goods categories are subject to high risk of reidentification through a potential record linkage attack based on their unique purchasing histories even when their identities are anonymized. We first demonstrate that the risk of reidentification is vastly understated by unicity, the conventional measure. Instead, we propose a new measure of reidentification risk, termed sno-unicity, which accounts for the longitudinal nature of panel data, and show that it is much larger than unicity. To protect the privacy of panelists, we consider the well-known privacy notion of k -anonymity and develop a new approach called graph-based minimum movement k-anonymization ( k- MM) that is designed especially for panel data. The proposed k -MM approach can be formulated as an optimization problem in which the objective is to minimally distort variables in the original data based on weights that users prespecify corresponding to their use case. We further show how our approach can be extended to achieve l -diversity. We apply the k -MM approach to two different panel data sets that are widely used in marketing research. To achieve a given privacy level, compared with several benchmark protection methods, the protected data from our method result in the least distortion in inferences about key marketing metrics, such as brand market shares, share of category requirements, brand switching rates, and marketing-mix parameters estimated from a hierarchical Bayesian brand choice model.
Keywords: brand choice; data privacy; data sharing; hierarchical Bayesian model; optimization; unicity (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://dx.doi.org/10.1287/isre.2022.1169 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:inm:orisre:v:34:y:2023:i:3:p:1066-1088
Access Statistics for this article
More articles in Information Systems Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().