A Collaborative Multi-Agent Reinforcement Learning Approach for Non-Stationary Environments with Unknown Change Points

Wang, Suyu; Yue, Quan; Xu, Zhenlei; Qiao, Peihong; Lyu, Zhentao; Gao, Feng

A Collaborative Multi-Agent Reinforcement Learning Approach for Non-Stationary Environments with Unknown Change Points

Suyu Wang, Quan Yue, Zhenlei Xu, Peihong Qiao, Zhentao Lyu and Feng Gao ()
Additional contact information
Suyu Wang: School of Mechanical and Electrical Engineering, China University of Mining and Technology-Beijing, Beijing 100083, China
Quan Yue: School of Mechanical and Electrical Engineering, China University of Mining and Technology-Beijing, Beijing 100083, China
Zhenlei Xu: School of Mechanical and Electrical Engineering, China University of Mining and Technology-Beijing, Beijing 100083, China
Peihong Qiao: School of Mechanical and Electrical Engineering, China University of Mining and Technology-Beijing, Beijing 100083, China
Zhentao Lyu: School of Mechanical and Electrical Engineering, China University of Mining and Technology-Beijing, Beijing 100083, China
Feng Gao: Beijing Huatie Information Technology Co., Ltd., Beijing 100081, China

Mathematics, 2025, vol. 13, issue 11, 1-25

Abstract: Reinforcement learning has achieved significant success in sequential decision-making problems but exhibits poor adaptability in non-stationary environments with unknown dynamics, a challenge particularly pronounced in multi-agent scenarios. This study aims to enhance the adaptive capability of multi-agent systems in such volatile environments. We propose a novel cooperative Multi-Agent Reinforcement Learning (MARL) algorithm based on MADDPG, termed MACPH, which innovatively incorporates three mechanisms: a Composite Experience Replay Buffer (CERB) mechanism that balances recent and important historical experiences through a dual-buffer structure and mixed sampling; an Adaptive Parameter Space Noise (APSN) mechanism that perturbs actor network parameters and dynamically adjusts the perturbation intensity to achieve coherent and state-dependent exploration; and a Huber loss function mechanism to mitigate the impact of outliers in Temporal Difference errors and enhance training stability. The study was conducted in standard and non-stationary navigation and communication task scenarios. Ablation studies confirmed the positive contributions of each component and their synergistic effects. In non-stationary scenarios featuring abrupt environmental changes, experiments demonstrate that MACPH outperforms baseline algorithms such as DDPG, MADDPG, and MATD3 in terms of reward performance, adaptation speed, learning stability, and robustness. The proposed MACPH algorithm offers an effective solution for multi-agent reinforcement learning applications in complex non-stationary environments.

Keywords: non-stationary environments; multi-agent reinforcement learning; unknown change points; composite experience replay buffer; adaptive exploration (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/13/11/1738/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/11/1738/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:11:p:1738-:d:1663584

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().