On Information Asymmetry in Competitive Multi-Agent Reinforcement Learning: Convergence and Optimality

Tampubolon, Ezra; Ceribasic, Haris; Boche, Holger

On Information Asymmetry in Competitive Multi-Agent Reinforcement Learning: Convergence and Optimality

Ezra Tampubolon, Haris Ceribasic and Holger Boche

Abstract: In this work, we study the system of interacting non-cooperative two Q-learning agents, where one agent has the privilege of observing the other's actions. We show that this information asymmetry can lead to a stable outcome of population learning, which generally does not occur in an environment of general independent learners. The resulting post-learning policies are almost optimal in the underlying game sense, i.e., they form a Nash equilibrium. Furthermore, we propose in this work a Q-learning algorithm, requiring predictive observation of two subsequent opponent's actions, yielding an optimal strategy given that the latter applies a stationary strategy, and discuss the existence of the Nash equilibrium in the underlying information asymmetrical game.

Date: 2020-10, Revised 2021-01
New Economics Papers: this item is included in nep-mic
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://arxiv.org/pdf/2010.10901 Latest version (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:arx:papers:2010.10901

Access Statistics for this paper

More papers in Papers from arXiv.org
Bibliographic data for series maintained by arXiv administrators ().