Bandits with Global Convex Constraints and Objective
Shipra Agrawal () and
Nikhil R. Devanur ()
Additional contact information
Shipra Agrawal: Industrial Engineering and Operations Research, Columbia University, New York, New York 10027
Nikhil R. Devanur: Microsoft Research, Redmond, Washington 98052
Operations Research, 2019, vol. 67, issue 5, 1486-1502
Abstract:
We consider a very general model for managing the exploration–exploitation trade-off, which allows global convex constraints and concave objective on the aggregate decisions over time in addition to the customary limitation on the time horizon. This model provides a natural framework to study many sequential decision-making problems with long-term convex constraints and concave utility and subsumes the classic multiarmed bandit (MAB) model and the bandits with knapsacks problem as special cases. We demonstrate that a natural extension of the upper confidence bound family of algorithms for MAB provides a polynomial time algorithm with near-optimal regret guarantees for this substantially more general model. We also provide computationally more efficient algorithms by establishing interesting connections between this problem and other well-studied problems/algorithms, such as the Blackwell approachability problem, online convex optimization, and the Frank–Wolfe technique for convex optimization. We give several concrete examples of applications, particularly in risk-sensitive revenue management under unknown demand distributions, in which this more general bandit model of sequential decision making allows for richer formulations and more efficient solutions of the problem.
Keywords: exploration–exploitation; sequential decision making under uncertainty; multiarmed bandit; online learning; convex optimization (search for similar items in EconPapers)
Date: 2019
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)
Downloads: (external link)
https://doi.org/opre.2019.1840 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:inm:oropre:v:67:y:2019:i:5:p:1486-1502
Access Statistics for this article
More articles in Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().