EconPapers    
Economics at your fingertips  
 

MIRA: Model-Based Imagined Rollouts Augmentation for Non-Stationarity in Multi-Agent Systems

Haotian Xu, Qi Fang, Cong Hu, Yue Hu () and Quanjun Yin
Additional contact information
Haotian Xu: College of Systems Engineering, National University of Denfense Technology, Changsha 410073, China
Qi Fang: College of Systems Engineering, National University of Denfense Technology, Changsha 410073, China
Cong Hu: College of Systems Engineering, National University of Denfense Technology, Changsha 410073, China
Yue Hu: College of Systems Engineering, National University of Denfense Technology, Changsha 410073, China
Quanjun Yin: College of Systems Engineering, National University of Denfense Technology, Changsha 410073, China

Mathematics, 2022, vol. 10, issue 17, 1-22

Abstract: One of the challenges in multi-agent systems comes from the environmental non-stationarity that policies of all agents are evolving individually over time. Many existing multi-agent reinforcement learning (MARL) methods have been proposed to address this problem. However, these methods rely on a large amount of training data and some of them require agents to intensely communicate, which is often impractical in real-world applications. To better tackle the non-stationarity problem, this article combines model-based reinforcement learning (MBRL) and meta-learning and proposes a method called Model-based Imagined Rollouts Augmentation (MIRA). Based on an environment dynamics model, distributed agents can independently perform multi-agent rollouts with opponent models during exploitation and learn to infer the environmental non-stationarity as a latent variable using the rollouts. Based on the world model and latent-variable inference module, we perform multi-agent soft actor-critic implementation for centralized training and decentralized decision making. Empirical results on the Multi-agent Particle Environment (MPE) have proved that the algorithm has a very considerable improvement in sample efficiency as well as better convergent rewards than state-of-the-art MARL methods, including COMA, MAAC, MADDPG, and VDN.

Keywords: multi-agent system; non-stationarity; model-based reinforcement learning; meta-learning (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2022
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/10/17/3059/pdf (application/pdf)
https://www.mdpi.com/2227-7390/10/17/3059/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:10:y:2022:i:17:p:3059-:d:896857

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jmathe:v:10:y:2022:i:17:p:3059-:d:896857