Structures of Optimal Policies in MDPs with Unbounded Jumps: The State of Our Art

Blok, H.; Spieksma, F. M.

Structures of Optimal Policies in MDPs with Unbounded Jumps: The State of Our Art

H. Blok () and F. M. Spieksma ()
Additional contact information
H. Blok: Eindhoven University of Technology
F. M. Spieksma: Leiden University

Chapter Chapter 5 in Markov Decision Processes in Practice, 2017, pp 131-186 from Springer

Abstract: Abstract The derivation of structural properties of countable state Markov decision processes (MDPs) is generally based on sample path methods or value iteration arguments. In the latter case, the method is to inductively prove the structural properties of interest for the n-horizon value function. A limit argument then should allow to deduce the structural properties for the infinite-horizon value function. In the case of discrete time MDPs with the objective to minimise the total expected α-discounted cost, this procedure is justified under mild conditions. When the objective is to minimise the long run average expected cost, value iteration does not necessarily converge. Allowing time to be continuous does not generate any further complications when the jump rates are bounded as a function of state, due to applicability of uniformisation. However, when the jump rates are unbounded as a function of state, uniformisation is only applicable after a suitable perturbation of the jump rates that does not destroy the desired structural properties. Thus, also a second limit argument is required. The importance of unbounded rate countable state MDPs has increased lately, due to applications modelling customer or patient impatience and abandonment. The theory validating the required limit arguments however does not seem to be complete, and results are scattered over the literature. In this chapter our objective has been to provide a systematic way to tackle this problem under relatively mild conditions, and to provide the necessary theory validating the presented approach. The base model is a parametrised Markov process (MP): both perturbed MPs and MDPs are special cases of a parametrised MP. The advantage is that the parameter can simultaneously model a policy and a perturbation.

Date: 2017
References: Add references at CitEc
Citations: View citations in EconPapers (3)

There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:isochp:978-3-319-47766-4_5

Ordering information: This item can be ordered from
http://www.springer.com/9783319477664

DOI: 10.1007/978-3-319-47766-4_5

Access Statistics for this chapter

More chapters in International Series in Operations Research & Management Science from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().