MIRA: Model-Based Imagined Rollouts Augmentation for Non-Stationarity in Multi-Agent Systems

Xu, Haotian; Fang, Qi; Hu, Cong; Hu, Yue; Yin, Quanjun

MIRA: Model-Based Imagined Rollouts Augmentation for Non-Stationarity in Multi-Agent Systems

Haotian Xu, Qi Fang, Cong Hu, Yue Hu () and Quanjun Yin
Additional contact information
Haotian Xu: College of Systems Engineering, National University of Denfense Technology, Changsha 410073, China
Qi Fang: College of Systems Engineering, National University of Denfense Technology, Changsha 410073, China
Cong Hu: College of Systems Engineering, National University of Denfense Technology, Changsha 410073, China
Yue Hu: College of Systems Engineering, National University of Denfense Technology, Changsha 410073, China
Quanjun Yin: College of Systems Engineering, National University of Denfense Technology, Changsha 410073, China

Mathematics, 2022, vol. 10, issue 17, 1-22

Abstract: One of the challenges in multi-agent systems comes from the environmental non-stationarity that policies of all agents are evolving individually over time. Many existing multi-agent reinforcement learning (MARL) methods have been proposed to address this problem. However, these methods rely on a large amount of training data and some of them require agents to intensely communicate, which is often impractical in real-world applications. To better tackle the non-stationarity problem, this article combines model-based reinforcement learning (MBRL) and meta-learning and proposes a method called Model-based Imagined Rollouts Augmentation (MIRA). Based on an environment dynamics model, distributed agents can independently perform multi-agent rollouts with opponent models during exploitation and learn to infer the environmental non-stationarity as a latent variable using the rollouts. Based on the world model and latent-variable inference module, we perform multi-agent soft actor-critic implementation for centralized training and decentralized decision making. Empirical results on the Multi-agent Particle Environment (MPE) have proved that the algorithm has a very considerable improvement in sample efficiency as well as better convergent rewards than state-of-the-art MARL methods, including COMA, MAAC, MADDPG, and VDN.

Keywords: multi-agent system; non-stationarity; model-based reinforcement learning; meta-learning (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2022
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/10/17/3059/pdf (application/pdf)
https://www.mdpi.com/2227-7390/10/17/3059/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:10:y:2022:i:17:p:3059-:d:896857

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().