Explore First, Exploit Next: The True Shape of Regret in Bandit Problems
Aurélien Garivier (),
Pierre Ménard and
Gilles Stoltz ()
Additional contact information
Aurélien Garivier: Institute de Mathématiques de Toulouse (IMT): Université Paul Sabatier—The French National Research Center (CNRS), 31062 Toulouse, France
Pierre Ménard: Institute de Mathématiques de Toulouse (IMT): Université Paul Sabatier—The French National Research Center (CNRS), 31062 Toulouse, France
Gilles Stoltz: HEC Paris Management Research Group (GREGHEC): HEC Paris—CNRS, 78351 Jouy-en-Josas, France
Mathematics of Operations Research, 2019, vol. 44, issue 2, 377-399
Abstract:
We revisit lower bounds on the regret in the case of multiarmed bandit problems. We obtain nonasymptotic, distribution-dependent bounds and provide simple proofs based only on well-known properties of Kullback–Leibler divergences. These bounds show in particular that in the initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the information-theoretic arguments used and they involve no unnecessary complications.
Keywords: multiarmed bandits; cumulative regret; information-theoretic proof techniques; nonasymptotic lower bounds (search for similar items in EconPapers)
Date: 2019
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)
Downloads: (external link)
https://doi.org/10.1287/moor.2017.0928 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:inm:ormoor:v:44:y:2019:i:2:p:377-399
Access Statistics for this article
More articles in Mathematics of Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().