Multiple Policy Improvements in Undiscounted Markov Renewal Programming
Paul J. Schweitzer
Additional contact information
Paul J. Schweitzer: Institute for Defense Analyses, Arlington, Virginia
Operations Research, 1971, vol. 19, issue 3, 784-793
Abstract:
This paper examines, for undiscounted unichain Markov renewal programming, both the Hastings policy-value iteration algorithm and the case of multiple policy improvements between each policy evaluation. The modified policy improvement procedure proposed by Hastings either increases the gain rate or maintains it, and has a larger value improvement in some transient state than in all recurrent states. This prevents cycling and ensures convergence of the policy-value iteration algorithm. Multiple policy improvements, using either the unmodified or modified policy-improvement procedure, are shown to settle ultimately upon higher-gain policies, if any exist. The iterated policy improvements, each time using the improved values, also lead to upper and lower bounds on the maximal gain rate.
Date: 1971
References: Add references at CitEc
Citations:
Downloads: (external link)
http://dx.doi.org/10.1287/opre.19.3.784 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:inm:oropre:v:19:y:1971:i:3:p:784-793
Access Statistics for this article
More articles in Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().