EconPapers    
Economics at your fingertips  
 

A Low-Rank Approximation for MDPs via Moment Coupling

Amy B. Z. Zhang () and Itai Gurvich ()
Additional contact information
Amy B. Z. Zhang: School of Operations Research and Information Engineering, Cornell University, Ithaca, New York 14853
Itai Gurvich: Kellogg School of Management, Northwestern University, Evanston, Illinois 60208

Operations Research, 2024, vol. 72, issue 3, 1255-1277

Abstract: We introduce a framework to approximate Markov decision processes (MDPs) that stands on two pillars: (i) state aggregation, as the algorithmic infrastructure, and (ii) central-limit-theorem-type approximations, as the mathematical underpinning of optimality guarantees. The theory is grounded in recent work by Braverman et al. [Braverman A, Gurvich I, Huang J (2020) On the Taylor expansion of value functions. Oper. Res. 68(2):631–65] that relates the solution of the Bellman equation to that of a partial differential equation (PDE) where, in the spirit of the central limit theorem, the transition matrix is reduced to its local first and second moments. Solving the PDE is not required by our method. Instead, we construct a “sister” (controlled) Markov chain whose two local transition moments are approximately identical with those of the focal chain. Because of this moment matching , the original chain and its sister are coupled through the PDE, a coupling that facilitates optimality guarantees. Embedded into standard soft aggregation algorithms, moment matching provides a disciplined mechanism to tune the aggregation and disaggregation probabilities. Computational gains arise from the reduction of the effective state space from N to N 1 2 + ϵ is as one might intuitively expect from approximations grounded in the central limit theorem.

Keywords: Optimization; Markov processes; approximate dynamic programming; state aggregation; parameter design; algorithm analysis (search for similar items in EconPapers)
Date: 2024
References: Add references at CitEc
Citations:

Downloads: (external link)
http://dx.doi.org/10.1287/opre.2022.2392 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:oropre:v:72:y:2024:i:3:p:1255-1277

Access Statistics for this article

More articles in Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().

 
Page updated 2025-03-19
Handle: RePEc:inm:oropre:v:72:y:2024:i:3:p:1255-1277