Reinforcement learning from comparisons: Three alternatives are enough, two are not
Benoit Laslier and
Jean-François Laslier
Additional contact information
Benoit Laslier: ICJ - Institut Camille Jordan - ECL - École Centrale de Lyon - Université de Lyon - UCBL - Université Claude Bernard Lyon 1 - Université de Lyon - INSA Lyon - Institut National des Sciences Appliquées de Lyon - Université de Lyon - INSA - Institut National des Sciences Appliquées - UJM - Université Jean Monnet - Saint-Étienne - CNRS - Centre National de la Recherche Scientifique, PSPM - Probabilités, statistique, physique mathématique - ICJ - Institut Camille Jordan - ECL - École Centrale de Lyon - Université de Lyon - UCBL - Université Claude Bernard Lyon 1 - Université de Lyon - INSA Lyon - Institut National des Sciences Appliquées de Lyon - Université de Lyon - INSA - Institut National des Sciences Appliquées - UJM - Université Jean Monnet - Saint-Étienne - CNRS - Centre National de la Recherche Scientifique
Post-Print from HAL
Abstract:
This paper deals with two generalizations of the Polya urn model where, instead of sampling one ball from the urn at each time, we sample two or three balls. The processes are defined on the basis of the problem of finding the best alternative using pairwise comparisons which are not necessarily transitive: they can be thought of as evolutionary processes that tend to reinforce currently efficient alternatives. The two processes exhibit different behaviors: with three balls sampled, we prove almost sure convergence towards the unique optimal solution of the comparisons problem while, in some cases, the process with two balls sampled has almost surely no limit. This is an example of a natural reinforcement model with no exchangeability whose asymptotic behavior can be precisely characterized.
Date: 2017
References: Add references at CitEc
Citations: View citations in EconPapers (3)
Published in The Annals of Applied Probability, 2017, 27 (5), pp.2907-2925. ⟨10.1214/16-AAP1271⟩
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
Working Paper: Reinforcement learning from comparisons: Three alternatives are enough, two are not (2017)
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:hal:journl:halshs-01630231
DOI: 10.1214/16-AAP1271
Access Statistics for this paper
More papers in Post-Print from HAL
Bibliographic data for series maintained by CCSD ().