Learning the Minimal Representation of a Continuous State-Space Markov Decision Process from Transition Data
Amine Bennouna (),
Dessislava Pachamanova (),
Georgia Perakis () and
Omar Skali Lami ()
Additional contact information
Amine Bennouna: Operations Research Center, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Dessislava Pachamanova: Mathematics, Analytics, Science and Technology Division, Babson College, Wellesley, Massachusetts 02457
Georgia Perakis: Sloan School of Management, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Omar Skali Lami: Operations Research Center, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Management Science, 2025, vol. 71, issue 6, 5162-5184
Abstract:
This paper proposes a framework for learning the most concise Markov decision process (MDP) model of a continuous state-space dynamic system from observed transition data. This setting is encountered in numerous important applications, such as patient treatment, online advertising, recommender systems, and estimation of treatment effects in econometrics. Most existing methods in offline reinforcement learning construct functional approximations of the value or the transition and reward functions, requiring complex and often not interpretable function approximators. Our approach instead relies on partitioning the system’s observation space into regions constituting states of a finite MDP representing the system. We discuss the theoretically minimal MDP representation that preserves the values and, therefore, the optimal policy of the dynamic system—in a sense, the optimal discretization. We formally define the problem of learning such a concise representation from transition data without exploration. Learning such a representation allows for enhanced tractability and, importantly, provides interpretability. To solve this problem, we introduce an in-sample property on partitions of the observation space we name coherence , and we show that if the class of possible partitions is of finite Vapnik-Chervonenkis dimension, any coherent partition with the transition data converges to the minimal representation of the system with provable finite-sample probably approximately correct convergence guarantees. This insight motivates our minimal representation learning algorithm that constructs from transition data an MDP representation that approximates the minimal representation of the system. We illustrate the effectiveness of the proposed framework through numerical experiments in both deterministic and stochastic environments as well as with real data.
Keywords: reinforcement learning; statistical learning; block Markov decision process; discretization; interpretability; data-driven decision making; state representation learning; MDP state aggregation (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
http://dx.doi.org/10.1287/mnsc.2022.01652 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:inm:ormnsc:v:71:y:2025:i:6:p:5162-5184
Access Statistics for this article
More articles in Management Science from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().