Improving Utility of Private Join Size Estimation via Shuffling
Xin Liu,
Yibin Mao,
Meifan Zhang () and
Mohan Li
Additional contact information
Xin Liu: Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou 510006, China
Yibin Mao: Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou 510006, China
Meifan Zhang: Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou 510006, China
Mohan Li: Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou 510006, China
Mathematics, 2025, vol. 13, issue 21, 1-26
Abstract:
Join size estimation plays a crucial role in query optimization, correlation computing, and dataset discovery. A recent study, LDPJoinSketch, has explored the application of local differential privacy (LDP) to protect the privacy of two data sources when estimating their join size. However, the utility of LDPJoinSketch remains unsatisfactory due to the significant noise introduced by perturbation under LDP. In contrast, the shuffle model of differential privacy (SDP) can offer higher utility than LDP, as it introduces randomness based on both shuffling and perturbation. Nevertheless, existing research on SDP primarily focuses on basic statistical tasks, such as frequency estimation and binary summation. There is a paucity of studies addressing queries that involve join aggregation of two private data sources. In this paper, we investigate the problem of private join size estimation in the context of the shuffle model. First, drawing inspiration from the success of sketches in summarizing data under LDP, we propose a sketch-based join size estimation algorithm, SDPJoinSketch, under SDP, which demonstrates greater utility than LDPJoinSketch. We present theoretical proofs of the privacy amplification and utility of our method. Second, we consider separating high- and low-frequency items to reduce the hash-collision error of the sketch and propose an enhanced method called SDPJoinSketch+. Unlike LDPJoinSketch, we utilize secure encryption techniques to preserve frequency properties rather than perturbing them, further enhancing utility. Extensive experiments on both real-world and synthetic datasets validate the superior utility of our methods.
Keywords: differential privacy; shuffle model; join query; sketch (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/13/21/3468/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/21/3468/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:21:p:3468-:d:1783605
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().