Perturbation Theory and Undiscounted Markov Renewal Programming

Schweitzer, Paul J.

Perturbation Theory and Undiscounted Markov Renewal Programming

Paul J. Schweitzer
Additional contact information
Paul J. Schweitzer: Institute for Defense Analyses, Arlington, Virginia

Operations Research, 1969, vol. 17, issue 4, 716-727

Abstract: A recently-developed perturbation formalism for finite Markov chains is used here to analyze the policy iteration algorithm for undiscounted, single-chain Markov renewal programming. The relative values are shown to be essentially partial derivatives of the gain rate with respect to the transition probabilities, and they rank the states by indicating desirable changes in the probabilistic structure. This both implies the optimality of nonrandomized policies and suggests a gradient technique for optimizing the gain rate with respect to a parameter. The policy iteration algorithm is shown to be a steepest-ascent technique in policy space: the successor to a given policy is chosen in a direction that maximizes the directional derivative of the gain rate. The occurrence during policy improvement of the gain and relative values of the original policy is explained by their essentially determining the gradient of the gain rate.

Date: 1969
References: Add references at CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
http://dx.doi.org/10.1287/opre.17.4.716 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:oropre:v:17:y:1969:i:4:p:716-727

Access Statistics for this article

More articles in Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().