Differentially Private and Budget-Limited Bandit Learning over Matroids
Kai Han (),
Yuntian He (),
Alex X. Liu (),
Shaojie Tang () and
He Huang ()
Additional contact information
Kai Han: School of Computer Science and Technology/Suzhou Institute for Advanced Study, University of Science and Technology of China, Hefei, Anhui, 23006, People’s Republic of China
Yuntian He: School of Computer Science and Technology/Suzhou Institute for Advanced Study, University of Science and Technology of China, Hefei, Anhui, 23006, People’s Republic of China
Alex X. Liu: Department of Computer Science and Engineering, Michigan State University, East Lansing, Michigan 48824
Shaojie Tang: Naveen Jindal School of Management, University of Texas at Dallas, Richardson, Texas 75080
He Huang: School of Computer Science and Technology, Soochow University, Suzhou 215006, People’s Republic of China
INFORMS Journal on Computing, 2020, vol. 32, issue 3, 790-804
Abstract:
We propose the first budget-limited multi-armed bandit (BMAB) algorithm subject to a union of matroid constraints in arm pulling, while at the same time achieving differential privacy. Our model generalizes the arm-pulling models studied in prior BMAB schemes, and it can be used to address many practical problems such as network backbone construction and dynamic pricing in crowdsourcing. We handle the exploitation versus exploration tradeoff in our BMAB problem by exploiting the combinatorial structures of matroids, and reduce the searching complexity of arm selection based on a divide-and-conquer approach. Our algorithm achieves a uniform logarithmic regret bound with respect to B and ɛ-differential privacy, where B is the budget for pulling the arms with random costs. Without differential privacy, our algorithm achieves a uniform logarithmic regret bound with respect to B , which advances the asymptotic regret bounds achieved by prior BMAB algorithms. We performed side-by-side comparisons with prior schemes in our experiments. Experimental results show that our purely-combinatorial algorithm not only achieves significantly better regret performance, but also is more than 20 times faster than prior BMAB schemes, which use time-consuming LP-solving techniques.
Keywords: multi-armed bandit; matroid; budget (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://doi.org/10.1287/ijoc.2019.0903 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:inm:orijoc:v:32:y:3:i:2020:p:790-804
Access Statistics for this article
More articles in INFORMS Journal on Computing from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().