An Online Actor–Critic Algorithm with Function Approximation for Constrained Markov Decision Processes

Bhatnagar, Shalabh; Lakshmanan, K.

An Online Actor–Critic Algorithm with Function Approximation for Constrained Markov Decision Processes

Shalabh Bhatnagar () and K. Lakshmanan ()
Additional contact information
Shalabh Bhatnagar: Indian Institute of Science
K. Lakshmanan: Indian Institute of Science

Journal of Optimization Theory and Applications, 2012, vol. 153, issue 3, No 9, 688-708

Abstract: Abstract We develop an online actor–critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We consider the long-run average cost Markov decision process (MDP) framework in which both the objective and the constraint functions are suitable policy-dependent long-run averages of certain sample path functions. The Lagrange multiplier method is used to handle the inequality constraints. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal solution. We also provide the results of numerical experiments on a problem of routing in a multi-stage queueing network with constraints on long-run average queue lengths. We observe that our algorithm exhibits good performance on this setting and converges to a feasible point.

Keywords: Actor–critic algorithm; Constrained Markov decision processes; Long-run average cost criterion; Function approximation (search for similar items in EconPapers)
Date: 2012
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
http://link.springer.com/10.1007/s10957-012-9989-5 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:joptap:v:153:y:2012:i:3:d:10.1007_s10957-012-9989-5

Ordering information: This journal article can be ordered from
http://www.springer. ... cs/journal/10957/PS2

DOI: 10.1007/s10957-012-9989-5

Access Statistics for this article

Journal of Optimization Theory and Applications is currently edited by Franco Giannessi and David G. Hull

More articles in Journal of Optimization Theory and Applications from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().