Deep Reinforcement Learning-Based Multi-Agent System with Advanced Actor–Critic Framework for Complex Environment

Cui, Zihao; Deng, Kailian; Zhang, Hongtao; Zha, Zhongyi; Jobaer, Sayed

Deep Reinforcement Learning-Based Multi-Agent System with Advanced Actor–Critic Framework for Complex Environment

Zihao Cui, Kailian Deng (), Hongtao Zhang, Zhongyi Zha and Sayed Jobaer
Additional contact information
Zihao Cui: College of Information Science and Technology, Donghua University, Shanghai 201620, China
Kailian Deng: College of Information Science and Technology, Donghua University, Shanghai 201620, China
Hongtao Zhang: College of Information Science and Technology, Donghua University, Shanghai 201620, China
Zhongyi Zha: College of Information Science and Technology, Donghua University, Shanghai 201620, China
Sayed Jobaer: College of Information Science and Technology, Donghua University, Shanghai 201620, China

Mathematics, 2025, vol. 13, issue 5, 1-22

Abstract: The development of artificial intelligence (AI) game agents that use deep reinforcement learning (DRL) algorithms to process visual information for decision-making has emerged as a key research focus in both academia and industry. However, previous game agents have struggled to execute multiple commands simultaneously in a single decision, failing to accurately replicate the complex control patterns that characterize human gameplay. In this paper, we utilize the ViZDoom environment as the DRL research platform and transform the agent–environment interactions into a Partially Observable Markov Decision Process (POMDP). We introduce an advanced multi-agent deep reinforcement learning (DRL) framework, specifically a Multi-Agent Proximal Policy Optimization (MA-PPO), designed to optimize target acquisition while operating within defined ammunition and time constraints. In MA-PPO, each agent handles distinct parallel tasks with custom reward functions for performance evaluation. The agents make independent decisions while simultaneously executing multiple commands to mimic human-like gameplay behavior. Our evaluation compares MA-PPO against other DRL algorithms, showing a 30.67% performance improvement over the baseline algorithm.

Keywords: deep reinforcement learning; convolution neural network; partially observable Markov decision process; multi-agent system (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/13/5/754/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/5/754/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:5:p:754-:d:1599535

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().