EconPapers    
Economics at your fingertips  
 

Reinforcement Learning Explains Conditional Cooperation and Its Moody Cousin

Takahiro Ezaki, Yutaka Horita, Masanori Takezawa and Naoki Masuda

PLOS Computational Biology, 2016, vol. 12, issue 7, 1-13

Abstract: Direct reciprocity, or repeated interaction, is a main mechanism to sustain cooperation under social dilemmas involving two individuals. For larger groups and networks, which are probably more relevant to understanding and engineering our society, experiments employing repeated multiplayer social dilemma games have suggested that humans often show conditional cooperation behavior and its moody variant. Mechanisms underlying these behaviors largely remain unclear. Here we provide a proximate account for this behavior by showing that individuals adopting a type of reinforcement learning, called aspiration learning, phenomenologically behave as conditional cooperator. By definition, individuals are satisfied if and only if the obtained payoff is larger than a fixed aspiration level. They reinforce actions that have resulted in satisfactory outcomes and anti-reinforce those yielding unsatisfactory outcomes. The results obtained in the present study are general in that they explain extant experimental results obtained for both so-called moody and non-moody conditional cooperation, prisoner’s dilemma and public goods games, and well-mixed groups and networks. Different from the previous theory, individuals are assumed to have no access to information about what other individuals are doing such that they cannot explicitly use conditional cooperation rules. In this sense, myopic aspiration learning in which the unconditional propensity of cooperation is modulated in every discrete time step explains conditional behavior of humans. Aspiration learners showing (moody) conditional cooperation obeyed a noisy GRIM-like strategy. This is different from the Pavlov, a reinforcement learning strategy promoting mutual cooperation in two-player situations.Author Summary: Laboratory experiments using human participants have shown that, in groups or contact networks, humans often behave as conditional cooperator or its moody variant. Although conditional cooperation in dyadic interaction is well understood, mechanisms underlying these behaviors in group or networks beyond a pair of individuals largely remain unclear. In this study, we show that players adopting a type of reinforcement learning exhibit these conditional cooperation behaviors. The results are general in the sense that the model explains experimental results to date obtained in various situations. It explains moody conditional cooperation, which is a recently discovered behavioral trait of humans, in addition to traditional conditional cooperation. It also explains experimental results obtained with both the prisoner’s dilemma and public goods games and with different population structure. Crucially, our model assumes that individuals do not have access to information about what other individuals are doing such that they cannot explicitly condition their behavior on how many others have previously cooperated. Thus, our results provide a proximate and unified understanding of these experimentally observed patterns.

Date: 2016
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (21)

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005034 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 05034&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1005034

DOI: 10.1371/journal.pcbi.1005034

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().

 
Page updated 2025-03-19
Handle: RePEc:plo:pcbi00:1005034