Solution Procedures for Partially Observed Markov Decision Processes
Chelsea C. White and
William T. Scherer
Additional contact information
Chelsea C. White: University of Virginia, Charlottesville, Virginia
William T. Scherer: University of Virginia, Charlottesville, Virginia
Operations Research, 1989, vol. 37, issue 5, 791-797
Abstract:
We present three algorithms to solve the infinite horizon, expected discounted total reward partially observed Markov decision process (POMDP). Each algorithm integrates a successive approximations algorithm for the POMDP due to A. Smallwood and E. Sondik with an appropriately generalized numerical technique that has been shown to reduce CPU time until convergence for the completely observed case. The first technique is reward revision. The second technique is reward revision integrated with modified policy iteration. The third is a standard extrapolation. A numerical study indicates the potentially significant computational value of these algorithms.
Keywords: dynamic programming: Markov; finite state (search for similar items in EconPapers)
Date: 1989
References: Add references at CitEc
Citations: View citations in EconPapers (10)
Downloads: (external link)
http://dx.doi.org/10.1287/opre.37.5.791 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:inm:oropre:v:37:y:1989:i:5:p:791-797
Access Statistics for this article
More articles in Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().