A multi-armed bandit algorithm speeds up the evolution of cooperation
Roberto Cazzolla Gatti
Ecological Modelling, 2021, vol. 439, issue C
Abstract:
Most evolutionary biologists consider selfishness an intrinsic feature of our genes and as the best choice in social situations. During the last years, prolific research has been conducted on the mechanisms that can allow cooperation to emerge “in a world of defectors” to become an evolutionarily stable strategy. A big debate started with the proposal by W.D. Hamilton of “kin selection” in terms of cost sustained by the cooperators and benefits received by related conspecifics. After this, four other main rules for the evolution of cooperation have been suggested. However, one of the main problems of these five rules is the assumption that the payoffs obtained by either cooperating or defeating are quite well known by the parties before they interact and do not change during the time or after repeated encounters. This is not always the case in real life. Following each rule blindly, there is a risk for individuals to get stuck in an unfavorable situation. Axelrod (1984) highlighted that the main problem is how to obtain benefits from cooperation without passing through several trials and errors, which are slow and painful. With a better understanding of this process, individuals can use their foresight to speed up the evolution of cooperation. Here I show that a multi-armed bandit (MAB) model, a classic problem in decision sciences, is naturally employed by individuals to opt for the best choice most of the time, accelerating the evolution of the altruistic behavior and solving the abovementioned problems. A common MAB model that applies extremely well to the evolution of cooperation is the epsilon-greedy (ε-greedy) algorithm. This algorithm, after an initial period of exploration (which can be considered as biological history), greedily exploits the best option ε% of the time and explores other options the remaining percentage of times (1-ε%). Through the epsilon-greedy decision-making algorithm, cooperation evolves as a multilevel process nested in the hierarchical levels that exist among the five rules for the evolution of cooperation. This reinforcement learning, a subtype of artificial intelligence, with trials and errors, provides a powerful tool to better understand and even probabilistically quantify the chances cooperation has to evolve in a specific situation.
Keywords: Evolution of cooperation; Multi-armed bandit algorithm; Epsilon-greedy model; Matryoshka model (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0304380020304142
Full text for ScienceDirect subscribers only
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:eee:ecomod:v:439:y:2021:i:c:s0304380020304142
DOI: 10.1016/j.ecolmodel.2020.109348
Access Statistics for this article
Ecological Modelling is currently edited by Brian D. Fath
More articles in Ecological Modelling from Elsevier
Bibliographic data for series maintained by Catherine Liu ().