Noise-Regularized Advantage Value for Multi-Agent Reinforcement Learning

Wang, Siying; Chen, Wenyu; Hu, Jian; Hu, Siyue; Huang, Liwei

Noise-Regularized Advantage Value for Multi-Agent Reinforcement Learning

Siying Wang, Wenyu Chen, Jian Hu, Siyue Hu and Liwei Huang
Additional contact information
Siying Wang: School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
Wenyu Chen: School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
Jian Hu: Graduate Institute of Networking and Multimedia, National Taiwan University, Taipei 106, Taiwan
Siyue Hu: Department of Computer Science & Information Engineering, National Taiwan University, Taipei 106, Taiwan
Liwei Huang: School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China

Mathematics, 2022, vol. 10, issue 15, 1-15

Abstract: Leveraging global state information to enhance policy optimization is a common approach in multi-agent reinforcement learning (MARL). Even with the supplement of state information, the agents still suffer from insufficient exploration in the training stage. Moreover, training with batch-sampled examples from the replay buffer will induce the policy overfitting problem, i.e., multi-agent proximal policy optimization (MAPPO) may not perform as good as independent PPO (IPPO) even with additional information in the centralized critic. In this paper, we propose a novel noise-injection method to regularize the policies of agents and mitigate the overfitting issue. We analyze the cause of policy overfitting in actor–critic MARL, and design two specific patterns of noise injection applied to the advantage function with random Gaussian noise to stabilize the training and enhance the performance. The experimental results on the Matrix Game and StarCraft II show the higher training efficiency and superior performance of our method, and the ablation studies indicate our method will keep higher entropy of agents’ policies during training, which leads to more exploration.

Keywords: multi-agent reinforcement learning; proximal policy optimization; exploration; noise injection; advantage function (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/10/15/2728/pdf (application/pdf)
https://www.mdpi.com/2227-7390/10/15/2728/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:10:y:2022:i:15:p:2728-:d:878489

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().