EconPapers    
Economics at your fingertips  
 

Small-Loss Bounds for Online Learning with Partial Information

Thodoris Lykouris (), Karthik Sridharan () and Éva Tardos ()
Additional contact information
Thodoris Lykouris: Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Karthik Sridharan: Cornell University, Ithaca, New York 14850
Éva Tardos: Cornell University, Ithaca, New York 14850

Mathematics of Operations Research, 2022, vol. 47, issue 3, 2186-2218

Abstract: We consider the problem of adversarial (nonstochastic) online learning with partial-information feedback, in which, at each round, a decision maker selects an action from a finite set of alternatives. We develop a black-box approach for such problems in which the learner observes as feedback only losses of a subset of the actions that includes the selected action. When losses of actions are nonnegative, under the graph-based feedback model introduced by Mannor and Shamir, we offer algorithms that attain the so called “small-loss” o ( α L ⋆ ) regret bounds with high probability, where α is the independence number of the graph and L ⋆ is the loss of the best action. Prior to our work, there was no data-dependent guarantee for general feedback graphs even for pseudo-regret (without dependence on the number of actions, i.e., utilizing the increased information feedback). Taking advantage of the black-box nature of our technique, we extend our results to many other applications, such as combinatorial semi-bandits (including routing in networks), contextual bandits (even with an infinite comparator class), and learning with slowly changing (shifting) comparators. In the special case of multi-armed bandit and combinatorial semi-bandit problems, we provide optimal small-loss, high-probability regret guarantees of O ˜ ( d L ⋆ ) , where d is the number of actions, answering open questions of Neu. Previous bounds for multi-armed bandits and semi-bandits were known only for pseudo-regret and only in expectation. We also offer an optimal O ˜ ( κ L ⋆ ) regret guarantee for fixed feedback graphs with clique-partition number at most κ .

Keywords: Primary: 68Q32; online learning; feedback graphs; bandit algorithms; semi-bandits; contextual bandits; partial information; regret bounds; small-loss bounds; first-order bounds; high probability (search for similar items in EconPapers)
Date: 2022
References: Add references at CitEc
Citations:

Downloads: (external link)
http://dx.doi.org/10.1287/moor.2021.1204 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:ormoor:v:47:y:2022:i:3:p:2186-2218

Access Statistics for this article

More articles in Mathematics of Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().

 
Page updated 2025-03-19
Handle: RePEc:inm:ormoor:v:47:y:2022:i:3:p:2186-2218