Self-Adapting Network Relaxations for Weakly Coupled Markov Decision Processes

Nadarajah, Selvaprabu; Cire, Andre A.

Self-Adapting Network Relaxations for Weakly Coupled Markov Decision Processes

Selvaprabu Nadarajah () and Andre A. Cire ()
Additional contact information
Selvaprabu Nadarajah: Information and Decision Sciences, University of Illinois Chicago, Chicago, Illinois 60607
Andre A. Cire: Department of Management, University of Toronto Scarborough & Rotman School of Management, Toronto, Ontario M5S 1A1, Canada

Management Science, 2025, vol. 71, issue 2, 1779-1802

Abstract: High-dimensional weakly coupled Markov decision processes (WDPs) arise in dynamic decision making and reinforcement learning, decomposing into smaller Markov decision processes (MDPs) when linking constraints are relaxed. The Lagrangian relaxation of WDPs (LAG) exploits this property to compute policies and (optimistic) bounds efficiently; however, dualizing linking constraints averages away combinatorial information. We introduce feasibility network relaxations (FNRs), a new class of linear programming relaxations that exactly represents the linking constraints. We develop a procedure to obtain the unique minimally sized relaxation, which we refer to as self-adapting FNR, as its size automatically adjusts to the structure of the linking constraints. Our analysis informs model selection: (i) the self-adapting FNR provides (weakly) stronger bounds than LAG, is polynomially sized when linking constraints admit a tractable network representation, and can even be smaller than LAG, and (ii) self-adapting FNR provides bounds and policies that match the approximate linear programming (ALP) approach but is substantially smaller in size than the ALP formulation and a recent alternative Lagrangian that is equivalent to ALP. We perform numerical experiments on constrained dynamic assortment and preemptive maintenance applications. Our results show that self-adapting FNR significantly improves upon LAG in terms of policy performance and/or bounds, while being an order of magnitude faster than an alternative Lagrangian and ALP, which are unsolvable in several instances.

Keywords: Markov decision processes; networks; linear programming; weakly coupled dynamic programs (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
http://dx.doi.org/10.1287/mnsc.2022.01108 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:ormnsc:v:71:y:2025:i:2:p:1779-1802

Access Statistics for this article

More articles in Management Science from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().