A Lyapunov Theory for Finite-Sample Guarantees of Markovian Stochastic Approximation

Chen, Zaiwei; Maguluri, Siva T.; Shakkottai, Sanjay; Shanmugam, Karthikeyan

A Lyapunov Theory for Finite-Sample Guarantees of Markovian Stochastic Approximation

Zaiwei Chen (), Siva T. Maguluri (), Sanjay Shakkottai () and Karthikeyan Shanmugam ()
Additional contact information
Zaiwei Chen: The School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332
Siva T. Maguluri: The School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332
Sanjay Shakkottai: Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, Texas 78712
Karthikeyan Shanmugam: IBM Research AI Group, Yorktown Heights, New York 10598

Operations Research, 2024, vol. 72, issue 4, 1352-1367

Abstract: This paper develops a unified Lyapunov framework for finite-sample analysis of a Markovian stochastic approximation (SA) algorithm under a contraction operator with respect to an arbitrary norm. The main novelty lies in the construction of a valid Lyapunov function called the generalized Moreau envelope . The smoothness and an approximation property of the generalized Moreau envelope enable us to derive a one-step Lyapunov drift inequality, which is the key to establishing the finite-sample bounds. Our SA result has wide applications, especially in the context of reinforcement learning (RL). Specifically, we show that a large class of value-based RL algorithms can be modeled in the exact form of our Markovian SA algorithm. Therefore, our SA results immediately imply finite-sample guarantees for popular RL algorithms such as n -step temporal difference (TD) learning, TD ( λ ) , off-policy V -trace, and Q -learning. As byproducts, by analyzing the convergence bounds of n -step TD and TD ( λ ) , we provide theoretical insight into the problem about the efficiency of bootstrapping. Moreover, our finite-sample bounds of off-policy V -trace explicitly capture the tradeoff between the variance of the stochastic iterates and the bias in the limit.

Keywords: Machine Learning and Data Science; Markovian stochastic approximation; finite-sample analysis; Lyapunov drift method; generalized Moreau envelope; reinforcement learning (search for similar items in EconPapers)
Date: 2024
References: Add references at CitEc
Citations:

Downloads: (external link)
http://dx.doi.org/10.1287/opre.2022.0249 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:oropre:v:72:y:2024:i:4:p:1352-1367

Access Statistics for this article

More articles in Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().