EconPapers    
Economics at your fingertips  
 

Phase Transitions in Bandits with Switching Constraints

David Simchi-Levi () and Yunzong Xu ()
Additional contact information
David Simchi-Levi: Institute for Data, Systems, and Society, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139; Department of Civil and Environmental Engineering and Operations Research Center, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Yunzong Xu: Institute for Data, Systems, and Society, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139; Microsoft Research, New York, New York 10012; Department of Industrial and Enterprise Systems Engineering, University of Illinois, Urbana-Champaign, Illinois 61801

Management Science, 2023, vol. 69, issue 12, 7182-7201

Abstract: We consider the classic stochastic multiarmed bandit problem with a constraint that limits the total cost incurred by switching between actions to be no larger than a given switching budget. For this problem, we prove matching upper and lower bounds on the optimal (i.e., minimax) regret and provide efficient rate-optimal algorithms. Surprisingly, the optimal regret of this problem exhibits a nonconventional growth rate in terms of the time horizon and the number of arms. Consequently, we discover surprising “phase transitions” regarding how the optimal regret rate changes with respect to the switching budget: when the number of arms is fixed, there are equal-length phases, in which the optimal regret rate remains (almost) the same within each phase and exhibits abrupt changes between phases; when the number of arms grows with the time horizon, such abrupt changes become subtler and may disappear, but a generalized notion of phase transitions involving certain new measurements still exists. The results enable us to fully characterize the trade-off between the regret rate and the incurred switching cost in the stochastic multiarmed bandit problem, contributing new insights to this fundamental problem. Under the general switching cost structure, the results reveal interesting connections between bandit problems and graph traversal problems, such as the shortest Hamiltonian path problem.

Keywords: multiarmed bandit; switching constraint; phase transition; regret analysis; minimax lower bound; graph traversal (search for similar items in EconPapers)
Date: 2023
References: Add references at CitEc
Citations:

Downloads: (external link)
http://dx.doi.org/10.1287/mnsc.2023.4755 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:ormnsc:v:69:y:2023:i:12:p:7182-7201

Access Statistics for this article

More articles in Management Science from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().

 
Page updated 2025-03-19
Handle: RePEc:inm:ormnsc:v:69:y:2023:i:12:p:7182-7201