Reinforcement Learning with Restrictions on the Action Set
Mario Bravo and 
Mathieu Faure ()
Additional contact information 
Mario Bravo: USACH - Universidad de Santiago de Chile [Santiago]
Post-Print from  HAL
Abstract:
Consider a two-player normal-form game repeated over time. We introduce an adaptive learning procedure, where the players only observe their own realized payoff at each stage. We assume that agents do not know their own payoff function and have no information on the other player. Furthermore, we assume that they have restrictions on their own actions such that, at each stage, their choice is limited to a subset of their action set. We prove that the empirical distributions of play converge to the set of Nash equilibria for zero-sum and potential games, and games where one player has two actions.
Keywords: Economie; quantitative (search for similar items in EconPapers)
Date: 2015-01
References: Add references at CitEc 
Citations: View citations in EconPapers (1) 
Published in SIAM Journal on Control and Optimization, 2015, 53 (1), pp.287--312. ⟨10.1137/130936488⟩
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
Working Paper: Reinforcement Learning with Restrictions on the Action Set (2013) 
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX 
RIS (EndNote, ProCite, RefMan) 
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:hal:journl:hal-01457301
DOI: 10.1137/130936488
Access Statistics for this paper
More papers in Post-Print  from  HAL
Bibliographic data for series maintained by CCSD ().