On the Convex Formulations of Robust Markov Decision Processes

Grand-Clément, Julien; Petrik, Marek

On the Convex Formulations of Robust Markov Decision Processes

Julien Grand-Clément () and Marek Petrik ()
Additional contact information
Julien Grand-Clément: Information Systems and Operations Management Department, Ecole des Hautes Etudes Commerciales (HEC) de Paris, Jouy-en-Josas, France
Marek Petrik: Department of Computer Science, University of New Hampshire, Durham, New Hampshire 03824

Mathematics of Operations Research, 2025, vol. 50, issue 3, 1681-1706

Abstract: Robust Markov decision processes (MDPs) are used for applications of dynamic optimization in uncertain environments and have been studied extensively. Many of the main properties and algorithms of MDPs, such as value iteration and policy iteration, extend directly to RMDPs. Surprisingly, there is no known analog of the MDP convex optimization formulation for solving RMDPs. This work describes the first convex optimization formulation of RMDPs under the classical sa-rectangularity and s-rectangularity assumptions. By using entropic regularization and exponential change of variables, we derive a convex formulation with a number of variables and constraints polynomial in the number of states and actions, but with large coefficients in the constraints. We further simplify the formulation for RMDPs with polyhedral, ellipsoidal, or entropy-based uncertainty sets, showing that, in these cases, RMDPs can be reformulated as conic programs based on exponential cones, quadratic cones, and nonnegative orthants. Our work opens a new research direction for RMDPs and can serve as a first step toward obtaining a tractable convex formulation of RMDPs.

Keywords: Primary: 90C40; Secondary: 90C17; 90C25; Markov decision processes; robust optimization; conic optimization (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
http://dx.doi.org/10.1287/moor.2022.0284 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:ormoor:v:50:y:2025:i:3:p:1681-1706

Access Statistics for this article

More articles in Mathematics of Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().