Model-Based Reinforcement Learning for Offline Zero-Sum Markov Games
Yuling Yan (),
Gen Li (),
Yuxin Chen () and
Jianqing Fan
Additional contact information
Yuling Yan: Institute for Data, Systems, and Society, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142
Gen Li: Department of Statistics, The Chinese University of Hong Kong, Hong Kong Special Administrative Region, China
Yuxin Chen: Department of Statistics and Data Science, Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania 19104
Operations Research, 2024, vol. 72, issue 6, 2430-2445
Abstract:
This paper makes progress toward learning Nash equilibria in two-player, zero-sum Markov games from offline data. Specifically, consider a γ -discounted, infinite-horizon Markov game with S states, in which the max-player has A actions and the min-player has B actions. We propose a pessimistic model–based algorithm with Bernstein-style lower confidence bounds—called the value iteration with lower confidence bounds for zero-sum Markov games—that provably finds an ε -approximate Nash equilibrium with a sample complexity no larger than C clipped ⋆ S ( A + B ) ( 1 − γ ) 3 ε 2 (up to some log factor). Here, C clipped ⋆ is some unilateral clipped concentrability coefficient that reflects the coverage and distribution shift of the available data (vis-à-vis the target data), and the target accuracy ε can be any value within ( 0 , 1 1 − γ ] . Our sample complexity bound strengthens prior art by a factor of min { A , B } , achieving minimax optimality for a broad regime of interest. An appealing feature of our result lies in its algorithmic simplicity, which reveals the unnecessity of variance reduction and sample splitting in achieving sample optimality.
Keywords: Machine Learning and Data Science; zero-sum Markov games; Nash equilibrium; offline RL; model-based approach; unilateral coverage; curse of multiple agents; minimax optimality (search for similar items in EconPapers)
Date: 2024
References: Add references at CitEc
Citations:
Downloads: (external link)
http://dx.doi.org/10.1287/opre.2022.0342 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:inm:oropre:v:72:y:2024:i:6:p:2430-2445
Access Statistics for this article
More articles in Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().