EconPapers    
Economics at your fingertips  
 

Markov Decision Processes with Sample Path Constraints: The Communicating Case

Keith W. Ross and Ravi Varadarajan
Additional contact information
Keith W. Ross: University of Pennsylvania, Philadelphia, Pennsylvania
Ravi Varadarajan: University of Florida, Gainesville, Florida

Operations Research, 1989, vol. 37, issue 5, 780-790

Abstract: We consider time-average Markov Decision Processes (MDPs), which accumulate a reward and cost at each decision epoch. A policy meets the sample-path constraint if the time-average cost is below a specified value with probability one. The optimization problem is to maximize the expected average reward over all policies that meet the sample-path constraint. The sample-path constraint is compared with the more commonly studied constraint of requiring the average expected cost to be less than a specified value. Although the two criteria are equivalent for certain classes of MDPs, their feasible and optimal policies differ for many nontrivial problems. In general, there does not exist optimal or nearly optimal stationary policies when the expected average-cost constraint is employed. Assuming that a policy exists that meets the sample-path constraint, we establish that there exist nearly optimal stationary policies for communicating MDPs. A parametric linear programming algorithm is given to construct nearly optimal stationary policies. The discussion relies on well known results from the theory of stochastic processes and linear programming. The techniques lead to simple proofs of the existence of optimal and nearly optimal stationary policies for unichain and deterministic MDPs, respectively.

Keywords: dynamic programming; Markov finite state: time-average case; probability; Markov processes: sample-path constraints (search for similar items in EconPapers)
Date: 1989
References: Add references at CitEc
Citations: View citations in EconPapers (5)

Downloads: (external link)
http://dx.doi.org/10.1287/opre.37.5.780 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:oropre:v:37:y:1989:i:5:p:780-790

Access Statistics for this article

More articles in Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().

 
Page updated 2025-03-19
Handle: RePEc:inm:oropre:v:37:y:1989:i:5:p:780-790