Off-line Estimation of Controlled Markov Chains: Minimaxity and Sample Complexity
Imon Banerjee (),
Harsha Honnappa () and
Vinayak Rao ()
Additional contact information
Imon Banerjee: Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois 60208
Harsha Honnappa: Edwardson School of Industrial Engineering, Purdue University, West Lafayette, Indiana 47907
Vinayak Rao: Department of Statistics, Purdue University, West Lafayette, Indiana 47907
Operations Research, 2025, vol. 73, issue 4, 2281-2295
Abstract:
In this work, we study a natural nonparametric estimator of the transition probability matrices of a finite controlled Markov chain. We consider an off-line setting with a fixed data set of size m , collected using a so-called logging policy. We develop sample complexity bounds for the estimator and establish conditions for minimaxity. Our statistical bounds depend on the logging policy through its mixing properties. We show that achieving a particular statistical risk bound involves a subtle and interesting trade-off between the strength of the mixing properties and the number of samples. We demonstrate the validity of our results under various examples, such as ergodic Markov chains; weakly ergodic inhomogeneous Markov chains; and controlled Markov chains with nonstationary Markov, episodic, and greedy controls. Lastly, we use these sample complexity bounds to establish concomitant ones for off-line evaluation of stationary Markov control policies.
Keywords: Stochastic; Models; reinforcement learning; controlled Markov chains; stochastic processes; policy evaluation; nonparametric statistics (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
http://dx.doi.org/10.1287/opre.2023.0046 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:inm:oropre:v:73:y:2025:i:4:p:2281-2295
Access Statistics for this article
More articles in Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().