EconPapers    
Economics at your fingertips  
 

Online Optimization in X-Armed Bandits

Sébastien Bubeck (), Rémi Munos (), Gilles Stoltz and Csaba Szepesvari ()
Additional contact information
Sébastien Bubeck: SEQUEL - Sequential Learning - LIFL - Laboratoire d'Informatique Fondamentale de Lille - Université de Lille, Sciences et Technologies - Inria - Institut National de Recherche en Informatique et en Automatique - Université de Lille, Sciences Humaines et Sociales - CNRS - Centre National de la Recherche Scientifique - Centre Inria de l'Université de Lille - Inria - Institut National de Recherche en Informatique et en Automatique - LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal - Université de Lille, Sciences et Technologies - Centrale Lille - CNRS - Centre National de la Recherche Scientifique
Rémi Munos: SEQUEL - Sequential Learning - LIFL - Laboratoire d'Informatique Fondamentale de Lille - Université de Lille, Sciences et Technologies - Inria - Institut National de Recherche en Informatique et en Automatique - Université de Lille, Sciences Humaines et Sociales - CNRS - Centre National de la Recherche Scientifique - Centre Inria de l'Université de Lille - Inria - Institut National de Recherche en Informatique et en Automatique - LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal - Université de Lille, Sciences et Technologies - Centrale Lille - CNRS - Centre National de la Recherche Scientifique
Csaba Szepesvari: Department of Computing Science [Edmonton] - University of Alberta

Post-Print from HAL

Abstract: We consider a generalization of stochastic bandit problems where the set of arms, X, is allowed to be a generic topological space. We constraint the mean-payoff function with a dissimilarity function over X in a way that is more general than Lipschitz. We construct an arm selection policy whose regret improves upon previous result for a large class of problems. In particular, our results imply that if X is the unit hypercube in a Euclidean space and the mean-payoff function has a finite number of global maxima around which the behavior of the function is locally Holder with a known exponent, then the expected regret is bounded up to a logarithmic factor by $\sqrt{n}$, i.e., the rate of the growth of the regret is independent of the dimension of the space. Moreover, we prove the minimax optimality of our algorithm for the class of mean-payoff functions we consider.

Date: 2008-12-08
Note: View the original document on HAL open archive server: https://inria.hal.science/inria-00329797v1
References: View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Published in Twenty-Second Annual Conference on Neural Information Processing Systems, Dec 2008, Vancouver, Canada

Downloads: (external link)
https://inria.hal.science/inria-00329797v1/document (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:hal:journl:inria-00329797

Access Statistics for this paper

More papers in Post-Print from HAL
Bibliographic data for series maintained by CCSD ().

 
Page updated 2025-03-31
Handle: RePEc:hal:journl:inria-00329797