Performance Guarantees for Empirical Markov Decision Processes with Applications to Multiperiod Inventory Models

Cooper, William L.; Rangarajan, Bharath

Performance Guarantees for Empirical Markov Decision Processes with Applications to Multiperiod Inventory Models

William L. Cooper () and Bharath Rangarajan ()
Additional contact information
William L. Cooper: Department of Industrial and Systems Engineering, University of Minnesota, Minneapolis, Minnesota 55455
Bharath Rangarajan: Merchandising Operations, Target Corporation, Minneapolis, Minnesota 55402

Operations Research, 2012, vol. 60, issue 5, 1267-1281

Abstract: We consider Markov decision processes with unknown transition probabilities and unknown single-period expected cost functions, and we study a method for estimating these quantities from historical or simulated data. The method requires knowledge of the system equations that govern state transitions as well as the single-period cost functions (but not the single-period expected cost functions). The estimation procedure is based upon taking expectations with respect to the empirical distribution functions of such data. Once the estimates are in place, the method computes a policy by solving the obtained “empirical” Markov decision process as if the estimates were correct. For MDPs that satisfy some conditions, we provide explicit, easily computed expressions for the probability that the procedure will produce a policy whose true expected cost is within any specified absolute distance of the actual optimal expected cost of the true Markov decision process. We also provide expressions for the number of historical or simulated data values that is sufficient for the procedure to produce a policy whose true expected cost is, with a prescribed probability, within a prescribed absolute distance of the actual optimal expected cost of the true Markov decision process. We apply our results to multiperiod inventory models. In addition, we provide a specialized analysis of such inventory models that also yields relative, rather than absolute, accuracy guarantees. We make comparisons with related results that have recently appeared, and we provide numerical examples.

Keywords: dynamic programming/optimal control; Markov; inventory/production; statistics; estimation (search for similar items in EconPapers)
Date: 2012
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
http://dx.doi.org/10.1287/opre.1120.1090 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:oropre:v:60:y:2012:i:5:p:1267-1281

Access Statistics for this article

More articles in Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().