Entropy Regularization for Mean Field Games with Learning
Xin Guo (),
Renyuan Xu () and
Thaleia Zariphopoulou ()
Additional contact information
Xin Guo: Department of Industrial Engineering and Operations Research, University of California, Berkeley, California 94720; Tsinghua-UC Berkeley Shenzhen Institute, Shenzhen 518055, China
Renyuan Xu: Epstein Department of Industrial and Systems Engineering, University of Southern California, Los Angeles, California 90089; Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom
Thaleia Zariphopoulou: Departments of Mathematics and IROM, The University of Texas at Austin, Austin, Texas 78712
Mathematics of Operations Research, 2022, vol. 47, issue 4, 3239-3260
Abstract:
Entropy regularization has been extensively adopted to improve the efficiency, the stability, and the convergence of algorithms in reinforcement learning. This paper analyzes both quantitatively and qualitatively the impact of entropy regularization for mean field games (MFGs) with learning in a finite time horizon. Our study provides a theoretical justification that entropy regularization yields time-dependent policies and, furthermore, helps stabilizing and accelerating convergence to the game equilibrium. In addition, this study leads to a policy-gradient algorithm with exploration in MFG. With this algorithm, agents are able to learn the optimal exploration scheduling, with stable and fast convergence to the game equilibrium.
Keywords: Primary: 35B37; 90-XX; 91-XX; 68-XX; mean field games; multi-agent reinforcement learning; entropy regularization; linear-quadratic games (search for similar items in EconPapers)
Date: 2022
References: Add references at CitEc
Citations:
Downloads: (external link)
http://dx.doi.org/10.1287/moor.2021.1238 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:inm:ormoor:v:47:y:2022:i:4:p:3239-3260
Access Statistics for this article
More articles in Mathematics of Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().