Stability Analysis for Autonomous Vehicle Navigation Trained over Deep Deterministic Policy Gradient

Cabezas-Olivenza, Mireya; Zulueta, Ekaitz; Sanchez-Chica, Ander; Fernandez-Gamiz, Unai; Teso-Fz-Betoño, Adrian

Stability Analysis for Autonomous Vehicle Navigation Trained over Deep Deterministic Policy Gradient

Mireya Cabezas-Olivenza, Ekaitz Zulueta (), Ander Sanchez-Chica, Unai Fernandez-Gamiz and Adrian Teso-Fz-Betoño
Additional contact information
Mireya Cabezas-Olivenza: System Engineering and Automation Control Department, University of the Basque Country (UPV/EHU), Nieves Cano, 12, 01006 Vitoria-Gasteiz, Spain
Ekaitz Zulueta: System Engineering and Automation Control Department, University of the Basque Country (UPV/EHU), Nieves Cano, 12, 01006 Vitoria-Gasteiz, Spain
Ander Sanchez-Chica: System Engineering and Automation Control Department, University of the Basque Country (UPV/EHU), Nieves Cano, 12, 01006 Vitoria-Gasteiz, Spain
Unai Fernandez-Gamiz: Department of Nuclear and Fluid Mechanics, University of the Basque Country (UPV/EHU), Nieves Cano, 12, 01006 Vitoria-Gasteiz, Spain
Adrian Teso-Fz-Betoño: System Engineering and Automation Control Department, University of the Basque Country (UPV/EHU), Nieves Cano, 12, 01006 Vitoria-Gasteiz, Spain

Mathematics, 2022, vol. 11, issue 1, 1-27

Abstract: The Deep Deterministic Policy Gradient (DDPG) algorithm is a reinforcement learning algorithm that combines Q-learning with a policy. Nevertheless, this algorithm generates failures that are not well understood. Rather than looking for those errors, this study presents a way to evaluate the suitability of the results obtained. Using the purpose of autonomous vehicle navigation, the DDPG algorithm is applied, obtaining an agent capable of generating trajectories. This agent is evaluated in terms of stability through the Lyapunov function, verifying if the proposed navigation objectives are achieved. The reward function of the DDPG is used because it is unknown if the neural networks of the actor and the critic are correctly trained. Two agents are obtained, and a comparison is performed between them in terms of stability, demonstrating that the Lyapunov function can be used as an evaluation method for agents obtained by the DDPG algorithm. Verifying the stability at a fixed future horizon, it is possible to determine whether the obtained agent is valid and can be used as a vehicle controller, so a task-satisfaction assessment can be performed. Furthermore, the proposed analysis is an indication of which parts of the navigation area are insufficient in training terms.

Keywords: navigation; neural network; autonomous vehicle; reinforcement learning; DDPG; lyapunov; stability; q-learning (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/11/1/132/pdf (application/pdf)
https://www.mdpi.com/2227-7390/11/1/132/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:11:y:2022:i:1:p:132-:d:1016930

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().