EconPapers    
Economics at your fingertips  
 

Do Deep Reinforcement Learning Agents Model Intentions?

Tambet Matiisen (), Aqeel Labash, Daniel Majoral, Jaan Aru and Raul Vicente
Additional contact information
Tambet Matiisen: Institute of Computer Science, University of Tartu, Narva mnt 18, 51009 Tartu, Estonia
Aqeel Labash: Institute of Computer Science, University of Tartu, Narva mnt 18, 51009 Tartu, Estonia
Daniel Majoral: Institute of Computer Science, University of Tartu, Narva mnt 18, 51009 Tartu, Estonia
Jaan Aru: Institute of Computer Science, University of Tartu, Narva mnt 18, 51009 Tartu, Estonia
Raul Vicente: Institute of Computer Science, University of Tartu, Narva mnt 18, 51009 Tartu, Estonia

Stats, 2022, vol. 6, issue 1, 1-17

Abstract: Inferring other agents’ mental states, such as their knowledge, beliefs and intentions, is thought to be essential for effective interactions with other agents. Recently, multi-agent systems trained via deep reinforcement learning have been shown to succeed in solving various tasks. Still, how each agent models or represents other agents in their environment remains unclear. In this work, we test whether deep reinforcement learning agents trained with the multi-agent deep deterministic policy gradient (MADDPG) algorithm explicitly represent other agents’ intentions (their specific aims or plans) during a task in which the agents have to coordinate the covering of different spots in a 2D environment. In particular, we tracked over time the performance of a linear decoder trained to predict the final targets of all agents from the hidden-layer activations of each agent’s neural network controller. We observed that the hidden layers of agents represented explicit information about other agents’ intentions, i.e., the target landmark the other agent ended up covering. We also performed a series of experiments in which some agents were replaced by others with fixed targets to test the levels of generalization of the trained agents. We noticed that during the training phase, the agents developed a preference for each landmark, which hindered generalization. To alleviate the above problem, we evaluated simple changes to the MADDPG training algorithm which lead to better generalization against unseen agents. Our method for confirming intention modeling in deep learning agents is simple to implement and can be used to improve the generalization of multi-agent systems in fields such as robotics, autonomous vehicles and smart cities.

Keywords: multi-agent reinforcement learning; theory of mind; artificial neural networks (search for similar items in EconPapers)
JEL-codes: C1 C10 C11 C14 C15 C16 (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2571-905X/6/1/4/pdf (application/pdf)
https://www.mdpi.com/2571-905X/6/1/4/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jstats:v:6:y:2022:i:1:p:4-66:d:1017369

Access Statistics for this article

Stats is currently edited by Mrs. Minnie Li

More articles in Stats from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jstats:v:6:y:2022:i:1:p:4-66:d:1017369