Percentile optimization in multi-armed bandit problems
Zahra Ghatrani and
Archis Ghate ()
Additional contact information
Zahra Ghatrani: University of Washington
Archis Ghate: University of Washington
Annals of Operations Research, 2024, vol. 340, issue 2, No 5, 837-862
Abstract:
Abstract A multi-armed bandit (MAB) problem is described as follows. At each time-step, a decision-maker selects one arm from a finite set. A reward is earned from this arm and the state of that arm evolves stochastically. The goal is to determine an arm-pulling policy that maximizes expected total discounted reward over an infinite horizon. We study MAB problems where the rewards are multivariate Gaussian, to account for data-driven estimation errors. We employ a percentile optimization approach, wherein the goal is to find an arm-pulling policy that maximizes the sum of percentiles of expected total discounted rewards earned from individual arms. The idea is motivated by recent work on percentile optimization in Markov decision processes. We demonstrate that, when applied to MABs, this yields an intractable second-order cone program (SOCP) whose size is exponential in the number of arms. We use Lagrangian relaxation to break the resulting curse-of-dimensionality. Specifically, we show that the relaxed problem can be reformulated as an SOCP with size linear in the number of arms. We propose three approaches to recover feasible arm-pulling decisions during run-time from an off-line optimal solution of this SOCP. Our numerical experiments suggest that one of these three method appears to be more effective than the other two.
Keywords: Dynamic programming; Lagrangian relaxation; Chance-constrained programming (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s10479-024-06165-4 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:annopr:v:340:y:2024:i:2:d:10.1007_s10479-024-06165-4
Ordering information: This journal article can be ordered from
http://www.springer.com/journal/10479
DOI: 10.1007/s10479-024-06165-4
Access Statistics for this article
Annals of Operations Research is currently edited by Endre Boros
More articles in Annals of Operations Research from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().