Reinforcement learning in a prisoner's dilemma

Dolgopolov, Arthur

Reinforcement learning in a prisoner's dilemma

Arthur Dolgopolov

Games and Economic Behavior, 2024, vol. 144, issue C, 84-103

Abstract: I characterize the outcomes of a class of model-free reinforcement learning algorithms, such as stateless Q-learning, in a prisoner's dilemma. The behavior is studied in the limit as players stop experimenting after sufficiently exploring their options. A closed form relationship between the learning rate and game payoffs reveals whether the players will learn to cooperate or defect. The findings have implications for algorithmic collusion and also apply to asymmetric learners with different experimentation rules.

Keywords: Q-learning; Stochastic stability; Evolutionary game theory; Collusion; Pricing-algorithms (search for similar items in EconPapers)
JEL-codes: C72 C73 D43 D83 L41 (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0899825624000058
Full text for ScienceDirect subscribers only

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:eee:gamebe:v:144:y:2024:i:c:p:84-103

DOI: 10.1016/j.geb.2024.01.004

Access Statistics for this article

Games and Economic Behavior is currently edited by E. Kalai

More articles in Games and Economic Behavior from Elsevier
Bibliographic data for series maintained by Catherine Liu ().