On the Optimality of Structured Policies in Countable Stage Decision Processes

Porteus, Evan L.

On the Optimality of Structured Policies in Countable Stage Decision Processes

Evan L. Porteus
Additional contact information
Evan L. Porteus: Stanford University

Management Science, 1975, vol. 22, issue 2, 148-157

Abstract: Multi-stage decision processes are considered, in notation which is an outgrowth of that introduced by Denardo [Denardo, E. 1967. Contraction mappings in the theory underlying dynamic programming. SIAM Rev. 9 165-177.]. Certain Markov decision processes, stochastic games, and risk-sensitive Markov decision processes can be formulated in this notation. We identify conditions sufficient to prove that, in infinite horizon nonstationary processes, the optimal infinite horizon (present) value exists, is uniquely defined, is what is called "structured," and can be found by solving Bellman's optimality equations: \epsilon -optimal strategies exist: an optimal strategy can be found by applying Bellman's optimality criterion; and a specially identified kind of policy, called a "structured" policy is optimal in each stage. A link is thus drawn between (i) studies such as those of Blackwell [Blackwell, D. 1965. Discounted dynamic programming. Ann. Math. Stat. 36 226-235.] and Strauch [Strauch, R. 1966. Negative dynamic programming. Ann. Math. Stat. 37 871-890.], where general policies for general processes are considered, and (ii) other studies, such as those of Scarf [Scarf, H. 1963. The optimality of (S, s) policies in the dynamic inventory problem. H. Scarf, D. Gilford, M. Shelly, eds. Mathematical Methods in the Social Sciences . Stanford University Press, Stanford.] and Derman [Derman, C. 1963. On optimal replacement rules when changes of state are Markovian. R. Bellman, ed. Mathematical Optimization Techniques. University of California Press. Berkeley.] where structured policies for special processes are considered. Those familiar with dynamic programming models (e.g., inventory, queueing optimization, replacement, optimal stopping) will be well acquainted with the use of what we call structured policies and value functions. The infinite stage results are built on finite stage results. Results for the stationary infinite horizon case are also included. For an application, we provide conditions sufficient to prove that an optimal stationary strategy exists in a discounted stationary risk sensitive Markov decision process with constant risk aversion. In Porteus [Porteus, E. On the optimality of structured policies in countable stage decision processes. Research Paper No. 141, Graduate School of Business, Stanford University, 71 pp., 1973, 1974, unabridged version of present paper.], of which this is a condensation, we also (i) show how known conditions under which a Borel measurable policy is optimal in an infinite horizon, nonstationary Markov decision process, fit into our framework, and (ii) provide conditions under which a generalized (s, S) policy [Porteus, E. 1971. On the optimality of generalized (s, S) policies. Management Sci. 17 411-426.] is optimal in an infinite horizon nonstationary inventory process.

Date: 1975
References: Add references at CitEc
Citations: View citations in EconPapers (10)

Downloads: (external link)
http://dx.doi.org/10.1287/mnsc.22.2.148 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:ormnsc:v:22:y:1975:i:2:p:148-157

Access Statistics for this article

More articles in Management Science from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().