Perturbation Theory and Undiscounted Markov Renewal Programming
Paul J. Schweitzer
Additional contact information
Paul J. Schweitzer: Institute for Defense Analyses, Arlington, Virginia
Operations Research, 1969, vol. 17, issue 4, 716-727
Abstract:
A recently-developed perturbation formalism for finite Markov chains is used here to analyze the policy iteration algorithm for undiscounted, single-chain Markov renewal programming. The relative values are shown to be essentially partial derivatives of the gain rate with respect to the transition probabilities, and they rank the states by indicating desirable changes in the probabilistic structure. This both implies the optimality of nonrandomized policies and suggests a gradient technique for optimizing the gain rate with respect to a parameter. The policy iteration algorithm is shown to be a steepest-ascent technique in policy space: the successor to a given policy is chosen in a direction that maximizes the directional derivative of the gain rate. The occurrence during policy improvement of the gain and relative values of the original policy is explained by their essentially determining the gradient of the gain rate.
Date: 1969
References: Add references at CitEc
Citations: View citations in EconPapers (2)
Downloads: (external link)
http://dx.doi.org/10.1287/opre.17.4.716 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:inm:oropre:v:17:y:1969:i:4:p:716-727
Access Statistics for this article
More articles in Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().