EconPapers    
Economics at your fingertips  
 

Fast Rates for the Regret of Offline Reinforcement Learning

Yichun Hu (), Nathan Kallus () and Masatoshi Uehara ()
Additional contact information
Yichun Hu: Cornell University, New York, New York 10044
Nathan Kallus: Cornell University, New York, New York 10044
Masatoshi Uehara: Cornell University, New York, New York 10044

Mathematics of Operations Research, 2025, vol. 50, issue 1, 633-655

Abstract: We study the regret of offline reinforcement learning in an infinite-horizon discounted Markov decision process (MDP). While existing analyses of common approaches, such as fitted Q -iteration (FQI), suggest root- n convergence for regret, empirical behavior exhibits much faster convergence. In this paper, we present a finer regret analysis that exactly characterizes this phenomenon by providing fast rates for the regret convergence. First, we show that given any estimate for the optimal quality function, the regret of the policy it defines converges at a rate given by the exponentiation of the estimate’s pointwise convergence rate, thus speeding up the rate. The level of exponentiation depends on the level of noise in the decision-making problem, rather than the estimation problem. We establish such noise levels for linear and tabular MDPs as examples. Second, we provide new analyses of FQI and Bellman residual minimization to establish the correct pointwise convergence guarantees. As specific cases, our results imply one-over- n rates in linear cases and exponential-in- n rates in tabular cases. We extend our findings to general function approximation by extending our results to regret guarantees based on L p -convergence rates for estimating the optimal quality function rather than pointwise rates, where L 2 guarantees for nonparametric estimation can be ensured under mild conditions.

Keywords: Primary: 90C40; 68T05; 68Q32; offline reinforcement learning; fast rates; margin condition; fitted Q -iteration; Bellman residual minimization (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
http://dx.doi.org/10.1287/moor.2021.0167 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:ormoor:v:50:y:2025:i:1:p:633-655

Access Statistics for this article

More articles in Mathematics of Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().

 
Page updated 2025-03-19
Handle: RePEc:inm:ormoor:v:50:y:2025:i:1:p:633-655