Explore First, Exploit Next: The True Shape of Regret in Bandit Problems

Garivier, Aurélien; Ménard, Pierre; Stoltz, Gilles

Explore First, Exploit Next: The True Shape of Regret in Bandit Problems

Aurélien Garivier (), Pierre Ménard and Gilles Stoltz ()
Additional contact information
Aurélien Garivier: Institute de Mathématiques de Toulouse (IMT): Université Paul Sabatier—The French National Research Center (CNRS), 31062 Toulouse, France
Pierre Ménard: Institute de Mathématiques de Toulouse (IMT): Université Paul Sabatier—The French National Research Center (CNRS), 31062 Toulouse, France
Gilles Stoltz: HEC Paris Management Research Group (GREGHEC): HEC Paris—CNRS, 78351 Jouy-en-Josas, France

Mathematics of Operations Research, 2019, vol. 44, issue 2, 377-399

Abstract: We revisit lower bounds on the regret in the case of multiarmed bandit problems. We obtain nonasymptotic, distribution-dependent bounds and provide simple proofs based only on well-known properties of Kullback–Leibler divergences. These bounds show in particular that in the initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the information-theoretic arguments used and they involve no unnecessary complications.

Keywords: multiarmed bandits; cumulative regret; information-theoretic proof techniques; nonasymptotic lower bounds (search for similar items in EconPapers)
Date: 2019
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
https://doi.org/10.1287/moor.2017.0928 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:ormoor:v:44:y:2019:i:2:p:377-399

Access Statistics for this article

More articles in Mathematics of Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().