EconPapers    
Economics at your fingertips  
 

The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs

Edward J. Sondik
Additional contact information
Edward J. Sondik: Stanford University, Stanford, California

Operations Research, 1978, vol. 26, issue 2, 282-304

Abstract: This paper treats the discounted cost, optimal control problem for Markov processes with incomplete state information. The optimization approach for these partially observable Markov processes is a generalization of the well-known policy iteration technique for finding optimal stationary policies for completely observable Markov processes. The state space for the problem is the space of state occupancy probability distributions (the unit simplex). The development of the algorithm introduces several new ideas, including the class of finitely transient policies, which are shown to possess piecewise linear cost functions. The paper develops easily implemented approximations to stationary policies based on these finitely transient policies and shows that the concave hull of an approximation can be included in the well-known Howard policy improvement algorithm with subsequent convergence. The paper closes with a detailed example illustrating the application of the algorithm to the two-state partially observable Markov process.

Date: 1978
References: Add references at CitEc
Citations: View citations in EconPapers (27)

Downloads: (external link)
http://dx.doi.org/10.1287/opre.26.2.282 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:oropre:v:26:y:1978:i:2:p:282-304

Access Statistics for this article

More articles in Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().

 
Page updated 2025-03-19
Handle: RePEc:inm:oropre:v:26:y:1978:i:2:p:282-304