Reinforcement learning in a prisoner's dilemma
Arthur Dolgopolov
Games and Economic Behavior, 2024, vol. 144, issue C, 84-103
Abstract:
I characterize the outcomes of a class of model-free reinforcement learning algorithms, such as stateless Q-learning, in a prisoner's dilemma. The behavior is studied in the limit as players stop experimenting after sufficiently exploring their options. A closed form relationship between the learning rate and game payoffs reveals whether the players will learn to cooperate or defect. The findings have implications for algorithmic collusion and also apply to asymmetric learners with different experimentation rules.
Keywords: Q-learning; Stochastic stability; Evolutionary game theory; Collusion; Pricing-algorithms (search for similar items in EconPapers)
JEL-codes: C72 C73 D43 D83 L41 (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)
Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0899825624000058
Full text for ScienceDirect subscribers only
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:eee:gamebe:v:144:y:2024:i:c:p:84-103
DOI: 10.1016/j.geb.2024.01.004
Access Statistics for this article
Games and Economic Behavior is currently edited by E. Kalai
More articles in Games and Economic Behavior from Elsevier
Bibliographic data for series maintained by Catherine Liu ().