Policy Iteration for Continuous-Time Average Reward Markov Decision Processes in Polish Spaces

Zhu, Quanxin; Yang, Xinsong; Huang, Chuangxia

Policy Iteration for Continuous-Time Average Reward Markov Decision Processes in Polish Spaces

Quanxin Zhu, Xinsong Yang and Chuangxia Huang

Abstract and Applied Analysis, 2009, vol. 2009, 1-17

Abstract:

We study the policy iteration algorithm (PIA) for continuous-time jump Markov decision processes in general state and action spaces. The corresponding transition rates are allowed to be unbounded , and the reward rates may have neither upper nor lower bounds . The criterion that we are concerned with is expected average reward . We propose a set of conditions under which we first establish the average reward optimality equation and present the PIA. Then under two slightly different sets of conditions we show that the PIA yields the optimal (maximum) reward, an average optimal stationary policy, and a solution to the average reward optimality equation.

Date: 2009
References: Add references at CitEc
Citations:

Downloads: (external link)
http://downloads.hindawi.com/journals/AAA/2009/103723.pdf (application/pdf)
http://downloads.hindawi.com/journals/AAA/2009/103723.xml (text/xml)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:hin:jnlaaa:103723

DOI: 10.1155/2009/103723

Access Statistics for this article

More articles in Abstract and Applied Analysis from Hindawi
Bibliographic data for series maintained by Mohamed Abdelhakeem ().