An asymptotically optimal strategy for constrained multi-armed bandit problems
Hyeong Soo Chang ()
Additional contact information
Hyeong Soo Chang: Sogang University
Mathematical Methods of Operations Research, 2020, vol. 91, issue 3, No 6, 545-557
Abstract:
Abstract This note considers the model of “constrained multi-armed bandit” (CMAB) that generalizes that of the classical stochastic MAB by adding a feasibility constraint for each action. The feasibility is in fact another (conflicting) objective that should be kept in order for a playing-strategy to achieve the optimality of the main objective. While the stochastic MAB model is a special case of the Markov decision process (MDP) model, the CMAB model is a special case of the constrained MDP model. For the asymptotic optimality measured by the probability of choosing an optimal feasible arm over infinite horizon, we show that the optimality is achievable by a simple strategy extended from the $$\epsilon _t$$ϵt-greedy strategy used for unconstrained MAB problems. We provide a finite-time lower bound on the probability of correct selection of an optimal near-feasible arm that holds for all time steps. Under some conditions, the bound approaches one as time t goes to infinity. A particular example sequence of $$\{\epsilon _t\}$${ϵt} having the asymptotic convergence rate in the order of $$(1-\frac{1}{t})^4$$(1-1t)4 that holds from a sufficiently large t is also discussed.
Keywords: Multi-armed bandit; Constrained stochastic optimization; Simulation optimization; Constrained Markov decision process (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s00186-019-00697-3 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:mathme:v:91:y:2020:i:3:d:10.1007_s00186-019-00697-3
Ordering information: This journal article can be ordered from
http://www.springer.com/economics/journal/00186
DOI: 10.1007/s00186-019-00697-3
Access Statistics for this article
Mathematical Methods of Operations Research is currently edited by Oliver Stein
More articles in Mathematical Methods of Operations Research from Springer, Gesellschaft für Operations Research (GOR), Nederlands Genootschap voor Besliskunde (NGB)
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().