Exploration and Incentives in Reinforcement Learning
Max Simchowitz () and
Aleksandrs Slivkins ()
Additional contact information
Max Simchowitz: Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Aleksandrs Slivkins: Microsoft Research NYC, New York, New York 10012
Operations Research, 2024, vol. 72, issue 3, 983-998
Abstract:
How do you incentivize self-interested agents to explore when they prefer to exploit ? We consider complex exploration problems, where each agent faces the same (but unknown) Markov decision process (MDP). In contrast with traditional formulations of reinforcement learning, agents control the choice of policies, whereas an algorithm can only issue recommendations. However, the algorithm controls the flow of information, and can incentivize the agents to explore via information asymmetry. We design an algorithm which explores all reachable states in the MDP. We achieve provable guarantees similar to those for incentivizing exploration in static, stateless exploration problems studied previously. To the best of our knowledge, this is the first work to consider mechanism design in a stateful, reinforcement learning setting.
Keywords: Market Analytics and Revenue Management; incentivized exploration; exploration-exploitation tradeoff; mechanism design; information design; information asymmetry; Bayesian incentive-compatibility; reinforcement learning; Markov decision processes (search for similar items in EconPapers)
Date: 2024
References: Add references at CitEc
Citations:
Downloads: (external link)
http://dx.doi.org/10.1287/opre.2022.0495 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:inm:oropre:v:72:y:2024:i:3:p:983-998
Access Statistics for this article
More articles in Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().