Long-term values in Markov Decision Processes and Repeated Games, and a new distance for probability spaces

Renault, Jérôme; Venel, Xavier

Long-term values in Markov Decision Processes and Repeated Games, and a new distance for probability spaces

Jérôme Renault and Xavier Venel ()
Additional contact information
Xavier Venel: PSE - Paris School of Economics - UP1 - Université Paris 1 Panthéon-Sorbonne - ENS-PSL - École normale supérieure - Paris - PSL - Université Paris Sciences et Lettres - EHESS - École des hautes études en sciences sociales - ENPC - École nationale des ponts et chaussées - CNRS - Centre National de la Recherche Scientifique - INRAE - Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement, CES - Centre d'économie de la Sorbonne - UP1 - Université Paris 1 Panthéon-Sorbonne - CNRS - Centre National de la Recherche Scientifique

Post-Print from HAL

Abstract: We study long-term Markov Decision Processes and Gambling Houses, with applications to any partial observation MDPs with finitely many states and zero-sum repeated games with an informed controller. We consider a decision-maker which is maximizing the weighted sum t≥1 θtrt, where rt is the expected reward of the t-th stage. We prove the existence of a very strong notion of long-term value called general uniform value, representing the fact that the decision-maker can play well independently of the evaluations (θt) t≥1 over stages, provided the total variation (or impatience) t≥1 |θt+1 − θt| is small enough. This result generalizes previous results of Rosenberg, Solan and Vieille [35] and Renault [31] that focus on arithmetic means and discounted evaluations. Moreover, we give a variational characterization of the general uniform value via the introduction of appropriate invariant measures for the decision problems, generalizing the fundamental theorem of gambling or the Aumann-Maschler cavu formula for repeated games with incomplete information. Apart the introduction of appropriate invariant measures, the main innovation in our proofs is the introduction of a new metric d * such that partial observation MDP's and repeated games with an informed controller may be associated to auxiliary problems that are non-expansive with respect to d *. Given two Borel probabilities over a compact subset X of a normed vector space, we define d * (u, v) = sup f ∈D 1 |u(f) − v(f)|, where D1 is the set of functions satisfying: ∀x, y ∈ X, ∀a, b ≥ 0, af (x) − bf (y) ≤ ax − by. The particular case where X is a simplex endowed with the L 1-norm is particularly interesting: d * is the largest distance over the probabilities with finite support over X which makes every disintegration non-expansive. Moreover, we obtain a Kantorovich-Rubinstein type duality formula for d * (u, v) involving couples of measures (α, β) over X × X such that the first marginal of α is u and the second marginal of β is v. MSC Classification: Primary: 90C40 ; Secondary: 60J20, 91A15.

Keywords: Characterization of the value; Partial observation Markov decision processes; Uniform value; Repeated games; Wasserstein metric; Gambling Houses (search for similar items in EconPapers)
Date: 2017
References: Add references at CitEc
Citations: View citations in EconPapers (8)

Published in Mathematics of Operations Research, 2017, 42 (2), pp.349-376. ⟨10.1287/moor.2016.0814⟩

There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.

Related works:
Journal Article: Long-Term Values in Markov Decision Processes and Repeated Games, and a New Distance for Probability Spaces (2017)
Working Paper: Long-term values in Markov Decision Processes and Repeated Games, and a new distance for probability spaces (2017)
Working Paper: Long-term values in Markov Decision Processes and Repeated Games, and a new distance for probability spaces (2017)
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:hal:journl:hal-01396680

DOI: 10.1287/moor.2016.0814

Access Statistics for this paper

More papers in Post-Print from HAL
Bibliographic data for series maintained by CCSD ().