On Information Asymmetry in Competitive Multi-Agent Reinforcement Learning: Convergence and Optimality
Ezra Tampubolon,
Haris Ceribasic and
Holger Boche
Papers from arXiv.org
Abstract:
In this work, we study the system of interacting non-cooperative two Q-learning agents, where one agent has the privilege of observing the other's actions. We show that this information asymmetry can lead to a stable outcome of population learning, which generally does not occur in an environment of general independent learners. The resulting post-learning policies are almost optimal in the underlying game sense, i.e., they form a Nash equilibrium. Furthermore, we propose in this work a Q-learning algorithm, requiring predictive observation of two subsequent opponent's actions, yielding an optimal strategy given that the latter applies a stationary strategy, and discuss the existence of the Nash equilibrium in the underlying information asymmetrical game.
Date: 2020-10, Revised 2021-01
New Economics Papers: this item is included in nep-mic
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://arxiv.org/pdf/2010.10901 Latest version (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:arx:papers:2010.10901
Access Statistics for this paper
More papers in Papers from arXiv.org
Bibliographic data for series maintained by arXiv administrators ().