The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs
Edward J. Sondik
Additional contact information
Edward J. Sondik: Stanford University, Stanford, California
Operations Research, 1978, vol. 26, issue 2, 282-304
Abstract:
This paper treats the discounted cost, optimal control problem for Markov processes with incomplete state information. The optimization approach for these partially observable Markov processes is a generalization of the well-known policy iteration technique for finding optimal stationary policies for completely observable Markov processes. The state space for the problem is the space of state occupancy probability distributions (the unit simplex). The development of the algorithm introduces several new ideas, including the class of finitely transient policies, which are shown to possess piecewise linear cost functions. The paper develops easily implemented approximations to stationary policies based on these finitely transient policies and shows that the concave hull of an approximation can be included in the well-known Howard policy improvement algorithm with subsequent convergence. The paper closes with a detailed example illustrating the application of the algorithm to the two-state partially observable Markov process.
Date: 1978
References: Add references at CitEc
Citations: View citations in EconPapers (27)
Downloads: (external link)
http://dx.doi.org/10.1287/opre.26.2.282 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:inm:oropre:v:26:y:1978:i:2:p:282-304
Access Statistics for this article
More articles in Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().