Multiple Policy Improvements in Undiscounted Markov Renewal Programming

Schweitzer, Paul J.

Multiple Policy Improvements in Undiscounted Markov Renewal Programming

Paul J. Schweitzer
Additional contact information
Paul J. Schweitzer: Institute for Defense Analyses, Arlington, Virginia

Operations Research, 1971, vol. 19, issue 3, 784-793

Abstract: This paper examines, for undiscounted unichain Markov renewal programming, both the Hastings policy-value iteration algorithm and the case of multiple policy improvements between each policy evaluation. The modified policy improvement procedure proposed by Hastings either increases the gain rate or maintains it, and has a larger value improvement in some transient state than in all recurrent states. This prevents cycling and ensures convergence of the policy-value iteration algorithm. Multiple policy improvements, using either the unmodified or modified policy-improvement procedure, are shown to settle ultimately upon higher-gain policies, if any exist. The iterated policy improvements, each time using the improved values, also lead to upper and lower bounds on the maximal gain rate.

Date: 1971
References: Add references at CitEc
Citations:

Downloads: (external link)
http://dx.doi.org/10.1287/opre.19.3.784 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:oropre:v:19:y:1971:i:3:p:784-793

Access Statistics for this article

More articles in Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().