Optimization of Predefined-Time Agent-Scheduling Strategy Based on PPO
Dingding Qi,
Yingjun Zhao,
Longyue Li () and
Zhanxiao Jia
Additional contact information
Dingding Qi: Air Defense and AntiMissile School, Air Force Engineering University, Xi’an 710043, China
Yingjun Zhao: Air Defense and AntiMissile School, Air Force Engineering University, Xi’an 710043, China
Longyue Li: Air Defense and AntiMissile School, Air Force Engineering University, Xi’an 710043, China
Zhanxiao Jia: Unmanned System Research Institute, Northwestern Polytechnical University, Xi’an 710072, China
Mathematics, 2024, vol. 12, issue 15, 1-17
Abstract:
In this paper, we introduce an agent rescue scheduling approach grounded in proximal policy optimization, coupled with a singularity-free predefined-time control strategy. The primary objective of this methodology is to bolster the efficiency and precision of rescue missions. Firstly, we have designed an evaluation function closely related to the average flying distance of agents, which provides a quantitative benchmark for assessing different scheduling schemes and assists in optimizing the allocation of rescue resources. Secondly, we have developed a scheduling strategy optimization method using the Proximal Policy Optimization (PPO) algorithm. This method can automatically learn and adjust scheduling strategies to adapt to complex rescue environments and varying task demands. The evaluation function provides crucial feedback signals for the PPO algorithm, ensuring that the algorithm can precisely adjust the scheduling strategies to achieve optimal results. Thirdly, aiming to attain stability and precision in agent navigation to designated positions, we formulate a singularity-free predefined-time fuzzy adaptive tracking control strategy. This approach dynamically modulates control parameters in reaction to external disturbances and uncertainties, thus ensuring the precise arrival of agents at their destinations within the predefined time. Finally, to substantiate the validity of our proposed approach, we crafted a simulation environment in Python 3.7, engaging in a comparative analysis between the PPO and the other optimization method, Deep Q-network (DQN), utilizing the variation in reward values as the benchmark for evaluation.
Keywords: PPO; scheduling strategy; predefined time; multi-agent systems; rescue dispatching (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2024
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/12/15/2387/pdf (application/pdf)
https://www.mdpi.com/2227-7390/12/15/2387/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:12:y:2024:i:15:p:2387-:d:1447030
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().