Provably Efficient Reinforcement Learning with Linear Function Approximation

Jin, Chi; Yang, Zhuoran; Wang, Zhaoran; Jordan, Michael I.

Provably Efficient Reinforcement Learning with Linear Function Approximation

Chi Jin (), Zhuoran Yang (), Zhaoran Wang () and Michael I. Jordan ()
Additional contact information
Chi Jin: Princeton University, Princeton, New Jersey 08544
Zhuoran Yang: Yale University, New Haven, Connecticut 06520
Zhaoran Wang: Northwestern University, Evanston, Illinois 60208
Michael I. Jordan: University of California, Berkeley, Berkeley, California 94720

Mathematics of Operations Research, 2023, vol. 48, issue 3, 1496-1521

Abstract: Modern reinforcement learning (RL) is commonly applied to practical problems with an enormous number of states, where function approximation must be deployed to approximate either the value function or the policy. The introduction of function approximation raises a fundamental set of challenges involving computational and statistical efficiency, especially given the need to manage the exploration/exploitation trade-off. As a result, a core RL question remains open: how can we design provably efficient RL algorithms that incorporate function approximation? This question persists even in a basic setting with linear dynamics and linear rewards, for which only linear function approximation is needed. This paper presents the first provable RL algorithm with both polynomial run time and polynomial sample complexity in this linear setting, without requiring a “simulator” or additional assumptions. Concretely, we prove that an optimistic modification of least-squares value iteration—a classical algorithm frequently studied in the linear setting—achieves O ˜ ( d 3 H 3 T ) regret, where d is the ambient dimension of feature space, H is the length of each episode, and T is the total number of steps. Importantly, such regret is independent of the number of states and actions.

Keywords: Primary: 90C40; secondary: 68T05; reinforcement learning; episodic MDP; linear function approximation; exploration (search for similar items in EconPapers)
Date: 2023
References: Add references at CitEc
Citations:

Downloads: (external link)
http://dx.doi.org/10.1287/moor.2022.1309 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:ormoor:v:48:y:2023:i:3:p:1496-1521

Access Statistics for this article

More articles in Mathematics of Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().