EconPapers    
Economics at your fingertips  
 

Randomized and Past-Dependent Policies for Markov Decision Processes with Multiple Constraints

Keith W. Ross
Additional contact information
Keith W. Ross: University of Pennsylvania, Philadelphia, Pennsylvania

Operations Research, 1989, vol. 37, issue 3, 474-477

Abstract: The Markov decision problem of locating a policy to maximize the long-run average reward subject to K long-run average cost constraints is considered. It is assumed that the state and action spaces are finite and the law of motion is unichain, that is, every pure policy gives rise to a Markov chain with one recurrent class. It is first proved that there exists an optimal stationary policy with a degree of randomization no greater than K ; consequently, it is never necessary to randomize in more than K states. A linear program produces the optimal policy with limited randomization. For the special case of a single constraint, we also address the problem of finding optimal nonrandomized, but nonstationary, policies. We show that a round-robin type policy is optimal, and conjecture the same for a steering policy that depends on the entire past history of the process, but whose implementation requires essentially no more storage than that of a pure policy.

Keywords: dynamic programming: time average Markov decision processes; dynamic programming; Markov; finite state: linear programming algorithm (search for similar items in EconPapers)
Date: 1989
References: Add references at CitEc
Citations: View citations in EconPapers (4)

Downloads: (external link)
http://dx.doi.org/10.1287/opre.37.3.474 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:oropre:v:37:y:1989:i:3:p:474-477

Access Statistics for this article

More articles in Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().

 
Page updated 2025-03-19
Handle: RePEc:inm:oropre:v:37:y:1989:i:3:p:474-477