EconPapers    
Economics at your fingertips  
 

Maynard Smith revisited: A multi-agent reinforcement learning approach to the coevolution of signalling behaviour

Olivia Macmillan-Scott and Mirco Musolesi

PLOS Computational Biology, 2025, vol. 21, issue 8, 1-18

Abstract: The coevolution of signalling is a complex problem within animal behaviour, and is also central to communication between artificial agents. The Sir Philip Sidney game was designed to model this dyadic interaction from an evolutionary biology perspective, and was formulated to demonstrate the emergence of honest signalling. We use Multi-Agent Reinforcement Learning (MARL) to show that in the majority of cases, the resulting behaviour adopted by agents is not that shown in the original derivation of the model. This paper demonstrates that MARL can be a powerful tool to study evolutionary dynamics and understand the underlying mechanisms of learning over generations; particularly advantageous is the interpretability of this type of approach, as well as that fact that it allows us to study emergent behaviour without the need to constrain the strategy space from the outset. Although it originally set out to exemplify honest signalling, we show that the game provides no incentive for such behaviour. In the majority of cases, the optimal outcome is one that does not require a signal for the resource to be given. This type of interaction is observed within animal behaviour, and is sometimes denoted proactive prosociality. High learning and low discount rates of the reinforcement learning model are shown to be optimal in order to achieve the outcome that maximises both agents’ reward, and proximity to the given threshold leads to suboptimal learning.Author summary: When is it too costly for animals to signal that they are in need? Signalling is a crucial part of communication in animal behaviour, and it is also central other types of interactions, such as those involving artificial agents. We study emergent dynamics in the Sir Philip Sidney game, a game designed to show the mechanisms of honest signalling amongst animals. Using multi-agent reinforcement learning (MARL), we replicate generational learning and show that in the majority of scenarios, the optimal outcome is one of proactive prosociality rather than honest signalling: this is an outcome where a resource is given without the need for a costly signal. Such behaviour is observed within animal behaviour, most notably among primates. Our results also establish the usefulness of reinforcement learning as a tool to study emergent behaviour and dynamics within animal behaviour, for instance as shown here to study behavioural changes and learning over generations.

Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1013302 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 13302&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1013302

DOI: 10.1371/journal.pcbi.1013302

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().

 
Page updated 2025-08-30
Handle: RePEc:plo:pcbi00:1013302