EconPapers    
Economics at your fingertips  
 

A Policy Gradient Algorithm for the Risk-Sensitive Exponential Cost MDP

Mehrdad Moharrami (), Yashaswini Murthy (), Arghyadip Roy () and R. Srikant ()
Additional contact information
Mehrdad Moharrami: Computer Science Department, University of Iowa, Iowa City, Iowa 52242
Yashaswini Murthy: Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801; and Department of Electrical & Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801
Arghyadip Roy: Mehta Family School of Data Science and Artificial Intelligence, Indian Institute of Technology Guwahati, Guwahati, Assam 781039, India
R. Srikant: Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801; and Department of Electrical & Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801

Mathematics of Operations Research, 2025, vol. 50, issue 1, 431-458

Abstract: We study the risk-sensitive exponential cost Markov decision process (MDP) formulation and develop a trajectory-based gradient algorithm to find the stationary point of the cost associated with a set of parameterized policies. We derive a formula that can be used to compute the policy gradient from (state, action, cost) information collected from sample paths of the MDP for each fixed parameterized policy. Unlike the traditional average cost problem, standard stochastic approximation theory cannot be used to exploit this formula. To address the issue, we introduce a truncated and smooth version of the risk-sensitive cost and show that this new cost criterion can be used to approximate the risk-sensitive cost and its gradient uniformly under some mild assumptions. We then develop a trajectory-based gradient algorithm to minimize the smooth truncated estimation of the risk-sensitive cost and derive conditions under which a sequence of truncations can be used to solve the original, untruncated cost problem.

Keywords: 60J20; risk-sensitive Markov decision processes; reinforcement learning; policy gradient theorem; stochastic approximation (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
http://dx.doi.org/10.1287/moor.2022.0139 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:ormoor:v:50:y:2025:i:1:p:431-458

Access Statistics for this article

More articles in Mathematics of Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().

 
Page updated 2025-03-19
Handle: RePEc:inm:ormoor:v:50:y:2025:i:1:p:431-458