Application of a Deep Deterministic Policy Gradient Algorithm for Energy-Aimed Timetable Rescheduling Problem
Guang Yang,
Feng Zhang,
Cheng Gong and
Shiwen Zhang
Additional contact information
Guang Yang: School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, 800 Dongchuan RD, Shanghai 200240, China
Feng Zhang: School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, 800 Dongchuan RD, Shanghai 200240, China
Cheng Gong: School of Electrical and Computer Engineering, Georgia Institute of Technology, 85 5th Street NW, Atlanta, GA 30308, USA
Shiwen Zhang: School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, 800 Dongchuan RD, Shanghai 200240, China
Energies, 2019, vol. 12, issue 18, 1-19
Abstract:
Reinforcement learning has potential in the area of intelligent transportation due to its generality and real-time feature. The Q-learning algorithm, which is an early proposed algorithm, has its own merits to solve the train timetable rescheduling (TTR) problem. However, it has shortage in two aspects: Dimensional limits of action and a slow convergence rate. In this paper, a deep deterministic policy gradient (DDPG) algorithm is applied to solve the energy-aimed train timetable rescheduling (ETTR) problem. This algorithm belongs to reinforcement learning, which fulfills real-time requirements of the ETTR problem, and has adaptability on random disturbances. Superior to the Q-learning, DDPG has a continuous state space and action space. After enough training, the learning agent based on DDPG takes proper action by adjusting the cruising speed and the dwelling time continuously for each train in a metro network when random disturbances happen. Although training needs an iteration for thousands of episodes, the policy decision during each testing episode takes a very short time. Models for the metro network, based on a real case of the Shanghai Metro Line 1, are established as a training and testing environment. To validate the energy-saving effect and the real-time feature of the proposed algorithm, four experiments are designed and conducted. Compared with the no action strategy, results show that the proposed algorithm has real-time performance, and saves a significant percentage of energy under random disturbances.
Keywords: deep deterministic policy gradient; reinforcement learning; random disturbances; train timetable rescheduling; timetable optimization (search for similar items in EconPapers)
JEL-codes: Q Q0 Q4 Q40 Q41 Q42 Q43 Q47 Q48 Q49 (search for similar items in EconPapers)
Date: 2019
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (4)
Downloads: (external link)
https://www.mdpi.com/1996-1073/12/18/3461/pdf (application/pdf)
https://www.mdpi.com/1996-1073/12/18/3461/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jeners:v:12:y:2019:i:18:p:3461-:d:265235
Access Statistics for this article
Energies is currently edited by Ms. Agatha Cao
More articles in Energies from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().