Strong Uniform Value in Gambling Houses and Partially Observable Markov Decision Processes

Venel, Xavier; Ziliotto, Bruno

Strong Uniform Value in Gambling Houses and Partially Observable Markov Decision Processes

Xavier Venel () and Bruno Ziliotto
Additional contact information
Xavier Venel: CES - Centre d'économie de la Sorbonne - UP1 - Université Paris 1 Panthéon-Sorbonne - CNRS - Centre National de la Recherche Scientifique, PSE - Paris School of Economics - UP1 - Université Paris 1 Panthéon-Sorbonne - ENS-PSL - École normale supérieure - Paris - PSL - Université Paris Sciences et Lettres - EHESS - École des hautes études en sciences sociales - ENPC - École nationale des ponts et chaussées - CNRS - Centre National de la Recherche Scientifique - INRAE - Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement
Bruno Ziliotto: CEREMADE - CEntre de REcherches en MAthématiques de la DEcision - Université Paris Dauphine-PSL - PSL - Université Paris Sciences et Lettres - CNRS - Centre National de la Recherche Scientifique, Université Paris Dauphine-PSL - PSL - Université Paris Sciences et Lettres

Post-Print from HAL

Abstract: In several standard models of dynamic programming (gambling houses, MDPs, POMDPs), we prove the existence of a robust notion of value for the infinitely repeated problem, namely the strong uniform value. This solves two open problems. First, this shows that for any > 0, the decision-maker has a pure strategy σ which is-optimal in any n-stage problem, provided that n is big enough (this result was only known for behavior strategies, that is, strategies which use randomization). Second, for any > 0, the decision-maker can guarantee the limit of the n-stage value minus in the infinite problem where the payoff is the expectation of the inferior limit of the time average payoff.

Keywords: Uniform value; Long-run average payoff; Dynamic programming; Markov decision processes; Partial Observation (search for similar items in EconPapers)
Date: 2016
Note: View the original document on HAL open archive server: https://hal.science/hal-01395429v1
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Published in SIAM Journal on Control and Optimization, 2016, 54 (4), pp.1983-2008. ⟨10.1137/15M1043340⟩

Downloads: (external link)
https://hal.science/hal-01395429v1/document (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:hal:journl:hal-01395429

DOI: 10.1137/15M1043340

Access Statistics for this paper

More papers in Post-Print from HAL
Bibliographic data for series maintained by CCSD ().