EconPapers    
Economics at your fingertips  
 

Long-Term Values in Markov Decision Processes and Repeated Games, and a New Distance for Probability Spaces

Jérôme Renault and Xavier Venel ()
Additional contact information
Xavier Venel: Paris School of Economics; and Université Paris 1 Panthéon-Sorbonne, Paris, France

Mathematics of Operations Research, 2017, vol. 42, issue 2, 349-376

Abstract: We study long-term Markov decision processes (MDPs) and gambling houses, with applications to any partial observation MDPs with finitely many states and zero-sum repeated games with an informed controller. We consider a decision maker who is maximizing the weighted sum ∑ t ≥ 1 𝜃 t r t , where r t is the expected reward of the t -th stage. We prove the existence of a very strong notion of long-term value called general uniform value , representing the fact that the decision maker can play well independently of the evaluations (𝜃 t ) t ≥ 1 over stages, provided the total variation (or impatience) ∑ t ≥ 1 | 𝜃 t + 1 − 𝜃 t | is small enough. This result generalizes previous results of the literature that focus on arithmetic means and discounted evaluations. Moreover, we give a variational characterization of the general uniform value via the introduction of appropriate invariant measures for the decision problems, generalizing the fundamental theorem of gambling or the Aumann–Maschler cav( u ) formula for repeated games with incomplete information. Apart the introduction of appropriate invariant measures, the main innovation in our proofs is the introduction of a new metric, d * , such that partial observation MDPs and repeated games with an informed controller may be associated with auxiliary problems that are nonexpansive with respect to d * . Given two Borel probabilities over a compact subset X of a normed vector space, we define d ∗ ( u , v ) = sup f ∈ D 1 | u ( f ) − v ( f ) | , where D 1 is the set of functions satisfying ∀ x , y ∈ X , ∀ a , b ≥ 0, af ( x ) − bf ( y ) ≤ ‖ ax − by ‖. The particular case where X is a simplex endowed with the L 1 -norm is particularly interesting: d * is the largest distance over the probabilities with finite support over X , which makes every disintegration nonexpansive. Moreover, we obtain a Kantorovich–Rubinstein-type duality formula for d * ( u , v ), involving couples of measures ( α , β ) over X × X such that the first marginal of α is u and the second marginal of β is v .

Keywords: gambling houses; partial observation Markov decision processes; repeated games; Wasserstein metric; uniform value; characterization of the value (search for similar items in EconPapers)
Date: 2017
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (8)

Downloads: (external link)
https://doi.org/10.1287/moor.2016.0814 (application/pdf)

Related works:
Working Paper: Long-term values in Markov Decision Processes and Repeated Games, and a new distance for probability spaces (2017)
Working Paper: Long-term values in Markov Decision Processes and Repeated Games, and a new distance for probability spaces (2017)
Working Paper: Long-term values in Markov Decision Processes and Repeated Games, and a new distance for probability spaces (2017)
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:ormoor:v:42:y:2017:i:2:p:349-376

Access Statistics for this article

More articles in Mathematics of Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().

 
Page updated 2025-03-22
Handle: RePEc:inm:ormoor:v:42:y:2017:i:2:p:349-376