Fast Rates for the Regret of Offline Reinforcement Learning
Yichun Hu (),
Nathan Kallus () and
Masatoshi Uehara ()
Additional contact information
Yichun Hu: Cornell University, New York, New York 10044
Nathan Kallus: Cornell University, New York, New York 10044
Masatoshi Uehara: Cornell University, New York, New York 10044
Mathematics of Operations Research, 2025, vol. 50, issue 1, 633-655
Abstract:
We study the regret of offline reinforcement learning in an infinite-horizon discounted Markov decision process (MDP). While existing analyses of common approaches, such as fitted Q -iteration (FQI), suggest root- n convergence for regret, empirical behavior exhibits much faster convergence. In this paper, we present a finer regret analysis that exactly characterizes this phenomenon by providing fast rates for the regret convergence. First, we show that given any estimate for the optimal quality function, the regret of the policy it defines converges at a rate given by the exponentiation of the estimate’s pointwise convergence rate, thus speeding up the rate. The level of exponentiation depends on the level of noise in the decision-making problem, rather than the estimation problem. We establish such noise levels for linear and tabular MDPs as examples. Second, we provide new analyses of FQI and Bellman residual minimization to establish the correct pointwise convergence guarantees. As specific cases, our results imply one-over- n rates in linear cases and exponential-in- n rates in tabular cases. We extend our findings to general function approximation by extending our results to regret guarantees based on L p -convergence rates for estimating the optimal quality function rather than pointwise rates, where L 2 guarantees for nonparametric estimation can be ensured under mild conditions.
Keywords: Primary: 90C40; 68T05; 68Q32; offline reinforcement learning; fast rates; margin condition; fitted Q -iteration; Bellman residual minimization (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
http://dx.doi.org/10.1287/moor.2021.0167 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:inm:ormoor:v:50:y:2025:i:1:p:633-655
Access Statistics for this article
More articles in Mathematics of Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().