EconPapers    
Economics at your fingertips  
 

Using Inverse Reinforcement Learning with Real Trajectories to Get More Trustworthy Pedestrian Simulations

Francisco Martinez-Gil, Miguel Lozano, Ignacio García-Fernández, Pau Romero, Dolors Serra and Rafael Sebastián
Additional contact information
Francisco Martinez-Gil: Computational Multiscale Simulation Lab (CoMMLab), Escola Tècnica Superior d’Enginyeria (ETSE-UV), Universitat de València, 46010 València, Spain
Miguel Lozano: Computational Multiscale Simulation Lab (CoMMLab), Escola Tècnica Superior d’Enginyeria (ETSE-UV), Universitat de València, 46010 València, Spain
Ignacio García-Fernández: Computational Multiscale Simulation Lab (CoMMLab), Escola Tècnica Superior d’Enginyeria (ETSE-UV), Universitat de València, 46010 València, Spain
Pau Romero: Computational Multiscale Simulation Lab (CoMMLab), Escola Tècnica Superior d’Enginyeria (ETSE-UV), Universitat de València, 46010 València, Spain
Dolors Serra: Computational Multiscale Simulation Lab (CoMMLab), Escola Tècnica Superior d’Enginyeria (ETSE-UV), Universitat de València, 46010 València, Spain
Rafael Sebastián: Computational Multiscale Simulation Lab (CoMMLab), Escola Tècnica Superior d’Enginyeria (ETSE-UV), Universitat de València, 46010 València, Spain

Mathematics, 2020, vol. 8, issue 9, 1-15

Abstract: Reinforcement learning is one of the most promising machine learning techniques to get intelligent behaviors for embodied agents in simulations. The output of the classic Temporal Difference family of Reinforcement Learning algorithms adopts the form of a value function expressed as a numeric table or a function approximator. The learned behavior is then derived using a greedy policy with respect to this value function. Nevertheless, sometimes the learned policy does not meet expectations, and the task of authoring is difficult and unsafe because the modification of one value or parameter in the learned value function has unpredictable consequences in the space of the policies it represents. This invalidates direct manipulation of the learned value function as a method to modify the derived behaviors. In this paper, we propose the use of Inverse Reinforcement Learning to incorporate real behavior traces in the learning process to shape the learned behaviors, thus increasing their trustworthiness (in terms of conformance to reality). To do so, we adapt the Inverse Reinforcement Learning framework to the navigation problem domain. Specifically, we use Soft Q-learning, an algorithm based on the maximum causal entropy principle, with MARL-Ped (a Reinforcement Learning-based pedestrian simulator) to include information from trajectories of real pedestrians in the process of learning how to navigate inside a virtual 3D space that represents the real environment. A comparison with the behaviors learned using a Reinforcement Learning classic algorithm (Sarsa( λ )) shows that the Inverse Reinforcement Learning behaviors adjust significantly better to the real trajectories.

Keywords: inverse reinforcement learning; optimization; causal entropy; reinforcement learning; learning by demonstration; pedestrian simulation (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2020
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/8/9/1479/pdf (application/pdf)
https://www.mdpi.com/2227-7390/8/9/1479/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:8:y:2020:i:9:p:1479-:d:407733

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jmathe:v:8:y:2020:i:9:p:1479-:d:407733