Dynamic Programming Principles for Mean-Field Controls with Learning

Gu, Haotian; Guo, Xin; Wei, Xiaoli; Xu, Renyuan

Dynamic Programming Principles for Mean-Field Controls with Learning

Haotian Gu (), Xin Guo (), Xiaoli Wei () and Renyuan Xu ()
Additional contact information
Haotian Gu: Department of Mathematics, University of California, Berkeley, California 94720
Xin Guo: Department of Industrial Engineering and Operations Research, University of California, Berkeley, California 94720
Xiaoli Wei: Tsinghua-Berkeley Shenzhen Institute, Shenzen 518055, China
Renyuan Xu: Industrial and Systems Engineering, University of Southern California, Los Angeles, California 90001

Operations Research, 2023, vol. 71, issue 4, 1040-1054

Abstract: The dynamic programming principle (DPP) is fundamental for control and optimization, including Markov decision problems (MDPs), reinforcement learning (RL), and, more recently, mean-field controls (MFCs). However, in the learning framework of MFCs, the DPP has not been rigorously established, despite its critical importance for algorithm designs. In this paper, we first present a simple example in MFCs with learning where the DPP fails with a misspecified Q function and then propose the correct form of Q function in an appropriate space for MFCs with learning. This particular form of Q function is different from the classical one and is called the IQ function. In the special case when the transition probability and the reward are independent of the mean-field information, it integrates the classical Q function for single-agent RL over the state-action distribution. In other words, MFCs with learning can be viewed as lifting the classical RLs by replacing the state-action space with its probability distribution space. This identification of the IQ function enables us to establish precisely the DPP in the learning framework of MFCs. Finally, we illustrate through numerical experiments the time consistency of this IQ function.

Keywords: Financial Engineering; mean-field controls; dynamic programming principle; multi-agent reinforcement learning; reinforcement learning; Q-learning; cooperative game (search for similar items in EconPapers)
Date: 2023
References: Add references at CitEc
Citations:

Downloads: (external link)
http://dx.doi.org/10.1287/opre.2022.2395 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:oropre:v:71:y:2023:i:4:p:1040-1054

Access Statistics for this article

More articles in Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().