Optimal Oracle Inequalities for Projected Fixed-Point Equations, with Applications to Policy Evaluation

Mou, Wenlong; Pananjady, Ashwin; Wainwright, Martin J.

Optimal Oracle Inequalities for Projected Fixed-Point Equations, with Applications to Policy Evaluation

Wenlong Mou (), Ashwin Pananjady () and Martin J. Wainwright ()
Additional contact information
Wenlong Mou: University of California, Berkeley, Berkeley, California 94720
Ashwin Pananjady: Georgia Institute of Technology, Atlanta, Georgia 30332
Martin J. Wainwright: University of California, Berkeley, Berkeley, California 94720; Massachusetts Institute of Technology, Cambridge, Massachusetts 02139

Mathematics of Operations Research, 2023, vol. 48, issue 4, 2308-2336

Abstract: Linear fixed-point equations in Hilbert spaces arise in a variety of settings, including reinforcement learning, and computational methods for solving differential and integral equations. We study methods that use a collection of random observations to compute approximate solutions by searching over a known low-dimensional subspace of the Hilbert space. First, we prove an instance-dependent upper bound on the mean-squared error for a linear stochastic approximation scheme that exploits Polyak–Ruppert averaging. This bound consists of two terms: an approximation error term with an instance-dependent approximation factor and a statistical error term that captures the instance-specific complexity of the noise when projected onto the low-dimensional subspace. Using information-theoretic methods, we also establish lower bounds showing that both of these terms cannot be improved, again in an instance-dependent sense. A concrete consequence of our characterization is that the optimal approximation factor in this problem can be much larger than a universal constant. We show how our results precisely characterize the error of a class of temporal difference learning methods for the policy evaluation problem with linear function approximation, establishing their optimality.

Keywords: Primary: 62J05; 62M05; 65M15; 90C40; 93E24; linear fixed-point equations; Galerkin methods; temporal-difference methods; stochastic approximation; local minimax lower bounds (search for similar items in EconPapers)
Date: 2023
References: Add references at CitEc
Citations:

Downloads: (external link)
http://dx.doi.org/10.1287/moor.2022.1341 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:ormoor:v:48:y:2023:i:4:p:2308-2336

Access Statistics for this article

More articles in Mathematics of Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().