The Optimal Control of Partially Observable Markov Processes over a Finite Horizon
Richard D. Smallwood and
Edward J. Sondik
Additional contact information
Richard D. Smallwood: Stanford University, Stanford, California, and Xerox Palo Alto Research Center, Palo Alto, California
Edward J. Sondik: Stanford University, Stanford, California
Operations Research, 1973, vol. 21, issue 5, 1071-1088
Abstract:
This paper formulates the optimal control problem for a class of mathematical models in which the system to be controlled is characterized by a finite-state discrete-time Markov process. The states of this internal process are not directly observable by the controller; rather, he has available a set of observable outputs that are only probabilistically related to the internal state of the system. The formulation is illustrated by a simple machine-maintenance example, and other specific application areas are also discussed. The paper demonstrates that, if there are only a finite number of control intervals remaining, then the optimal payoff function is a piecewise-linear, convex function of the current state probabilities of the internal Markov process. In addition, an algorithm for utilizing this property to calculate the optimal control policy and payoff function for any finite horizon is outlined. These results are illustrated by a numerical example for the machine-maintenance problem.
Date: 1973
References: Add references at CitEc
Citations: View citations in EconPapers (74)
Downloads: (external link)
http://dx.doi.org/10.1287/opre.21.5.1071 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:inm:oropre:v:21:y:1973:i:5:p:1071-1088
Access Statistics for this article
More articles in Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().