EconPapers    
Economics at your fingertips  
 

Global Optimality Guarantees for Policy Gradient Methods

Jalaj Bhandari () and Daniel Russo ()
Additional contact information
Jalaj Bhandari: Operations Research, Columbia University, New York, New York 10027
Daniel Russo: Graduate School of Business, Columbia University, New York, New York 10027

Operations Research, 2024, vol. 72, issue 5, 1906-1927

Abstract: Policy gradients methods apply to complex, poorly understood, control problems by performing stochastic gradient descent over a parameterized class of polices. Unfortunately, even for simple control problems solvable by standard dynamic programming techniques, policy gradient algorithms face nonconvex optimization problems and are widely understood to converge only to a stationary point. This work identifies structural properties, shared by several classic control problems, that ensure the policy gradient objective function has no suboptimal stationary points despite being nonconvex. When these conditions are strengthened, this objective satisfies a Polyak-lojasiewicz (gradient dominance) condition that yields convergence rates. We also provide bounds on the optimality gap of any stationary point when some of these conditions are relaxed. Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2021.0014 .

Keywords: Machine Learning and Data Science; reinforcement learning; policy gradient methods; policy iteration; dynamic programming; gradient dominance (search for similar items in EconPapers)
Date: 2024
References: Add references at CitEc
Citations:

Downloads: (external link)
http://dx.doi.org/10.1287/opre.2021.0014 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:oropre:v:72:y:2024:i:5:p:1906-1927

Access Statistics for this article

More articles in Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().

 
Page updated 2025-03-19
Handle: RePEc:inm:oropre:v:72:y:2024:i:5:p:1906-1927