EconPapers    
Economics at your fingertips  
 

Q-learning and policy iteration algorithms for stochastic shortest path problems

Huizhen Yu () and Dimitri Bertsekas ()

Annals of Operations Research, 2013, vol. 208, issue 1, 95-132

Abstract: We consider the stochastic shortest path problem, a classical finite-state Markovian decision problem with a termination state, and we propose new convergent Q-learning algorithms that combine elements of policy iteration and classical Q-learning/value iteration. These algorithms are related to the ones introduced by the authors for discounted problems in Bertsekas and Yu (Math. Oper. Res. 37(1):66-94, 2012 ). The main difference from the standard policy iteration approach is in the policy evaluation phase: instead of solving a linear system of equations, our algorithm solves an optimal stopping problem inexactly with a finite number of value iterations. The main advantage over the standard Q-learning approach is lower overhead: most iterations do not require a minimization over all controls, in the spirit of modified policy iteration. We prove the convergence of asynchronous deterministic and stochastic lookup table implementations of our method for undiscounted, total cost stochastic shortest path problems. These implementations overcome some of the traditional convergence difficulties of asynchronous modified policy iteration, and provide policy iteration-like alternative Q-learning schemes with as reliable convergence as classical Q-learning. We also discuss methods that use basis function approximations of Q-factors and we give an associated error bound. Copyright Springer Science+Business Media, LLC 2013

Keywords: Markov decision processes; Q-learning; Approximate dynamic programming; Value iteration; Policy iteration; Stochastic shortest paths; Stochastic approximation (search for similar items in EconPapers)
Date: 2013
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (4)

Downloads: (external link)
http://hdl.handle.net/10.1007/s10479-012-1128-z (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:annopr:v:208:y:2013:i:1:p:95-132:10.1007/s10479-012-1128-z

Ordering information: This journal article can be ordered from
http://www.springer.com/journal/10479

DOI: 10.1007/s10479-012-1128-z

Access Statistics for this article

Annals of Operations Research is currently edited by Endre Boros

More articles in Annals of Operations Research from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:annopr:v:208:y:2013:i:1:p:95-132:10.1007/s10479-012-1128-z