V-Learning—A Simple, Efficient, Decentralized Algorithm for Multiagent Reinforcement Learning
Chi Jin (),
Qinghua Liu (),
Yuanhao Wang () and
Tiancheng Yu ()
Additional contact information
Chi Jin: Department of Electrical and Computer Engineering, Princeton University, Princeton, New Jersey 08544
Qinghua Liu: Department of Electrical and Computer Engineering, Princeton University, Princeton, New Jersey 08544
Yuanhao Wang: Department of Computer Science, Princeton University, Princeton, New Jersey 08544
Tiancheng Yu: Department of Electrical and Computer Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Mathematics of Operations Research, 2024, vol. 49, issue 4, 2295-2322
Abstract:
A major challenge of multiagent reinforcement learning (MARL) is the curse of multiagents , where the size of the joint action space scales exponentially with the number of agents. This remains to be a bottleneck for designing efficient MARL algorithms, even in a basic scenario with finitely many states and actions. This paper resolves this challenge for the model of episodic Markov games. We design a new class of fully decentralized algorithms—V-learning, which provably learns Nash equilibria (in the two-player zero-sum setting), correlated equilibria, and coarse correlated equilibria (in the multiplayer general-sum setting) in a number of samples that only scales with max i ∈ [ m ] A i , where A i is the number of actions for the i th player. This is in sharp contrast to the size of the joint action space, which is ∏ i = 1 m A i . V-learning (in its basic form) is a new class of single-agent reinforcement learning (RL) algorithms that convert any adversarial bandit algorithm with suitable regret guarantees into an RL algorithm. Similar to the classical Q-learning algorithm, it performs incremental updates to the value functions. Different from Q-learning, it only maintains the estimates of V-values instead of Q-values. This key difference allows V-learning to achieve the claimed guarantees in the MARL setting by simply letting all agents run V-learning independently.
Keywords: Primary: 93E35; secondary: 62L05; V-learning; Markov games; multiagent reinforcement learning; decentralized reinforcement learning; Nash equilibria; (coarse) correlated equilibria (search for similar items in EconPapers)
Date: 2024
References: Add references at CitEc
Citations:
Downloads: (external link)
http://dx.doi.org/10.1287/moor.2021.0317 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:inm:ormoor:v:49:y:2024:i:4:p:2295-2322
Access Statistics for this article
More articles in Mathematics of Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().