Provably Efficient Reinforcement Learning in Decentralized General-Sum Markov Games
Weichao Mao () and
Tamer Başar ()
Additional contact information
Weichao Mao: University of Illinois Urbana-Champaign
Tamer Başar: University of Illinois Urbana-Champaign
Dynamic Games and Applications, 2023, vol. 13, issue 1, No 7, 165-186
Abstract:
Abstract This paper addresses the problem of learning an equilibrium efficiently in general-sum Markov games through decentralized multi-agent reinforcement learning. Given the fundamental difficulty of calculating a Nash equilibrium (NE), we instead aim at finding a coarse correlated equilibrium (CCE), a solution concept that generalizes NE by allowing possible correlations among the agents’ strategies. We propose an algorithm in which each agent independently runs optimistic V-learning (a variant of Q-learning) to efficiently explore the unknown environment, while using a stabilized online mirror descent (OMD) subroutine for policy updates. We show that the agents can find an $$\epsilon $$ ϵ -approximate CCE in at most $$\widetilde{O}( H^6S A /\epsilon ^2)$$ O ~ ( H 6 S A / ϵ 2 ) episodes, where S is the number of states, A is the size of the largest individual action space, and H is the length of an episode. This appears to be the first sample complexity result for learning in generic general-sum Markov games. Our results rely on a novel investigation of an anytime high-probability regret bound for OMD with a dynamic learning rate and weighted regret, which would be of independent interest. One key feature of our algorithm is that it is decentralized, in the sense that each agent has access to only its local information, and is completely oblivious to the presence of others. This way, our algorithm can readily scale up to an arbitrary number of agents, without suffering from the exponential dependence on the number of agents.
Keywords: Markov game; Reinforcement learning; Coarse correlated equilibrium; Sample complexity (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s13235-021-00420-0 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:dyngam:v:13:y:2023:i:1:d:10.1007_s13235-021-00420-0
Ordering information: This journal article can be ordered from
http://www.springer.com/economics/journal/13235
DOI: 10.1007/s13235-021-00420-0
Access Statistics for this article
Dynamic Games and Applications is currently edited by Georges Zaccour
More articles in Dynamic Games and Applications from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().