Reinforcement learning from comparisons: Three alternatives are enough, two are not

Laslier, Benoit; Laslier, Jean-François

Reinforcement learning from comparisons: Three alternatives are enough, two are not

Benoit Laslier and Jean-François Laslier
Additional contact information
Benoit Laslier: ICJ - Institut Camille Jordan - ECL - École Centrale de Lyon - Université de Lyon - UCBL - Université Claude Bernard Lyon 1 - Université de Lyon - INSA Lyon - Institut National des Sciences Appliquées de Lyon - Université de Lyon - INSA - Institut National des Sciences Appliquées - UJM - Université Jean Monnet - Saint-Étienne - UJM EPE - Université Jean Monnet (EPSCPE) - CNRS - Centre National de la Recherche Scientifique, PSPM - Probabilités, statistique, physique mathématique - ICJ - Institut Camille Jordan - ECL - École Centrale de Lyon - Université de Lyon - UCBL - Université Claude Bernard Lyon 1 - Université de Lyon - INSA Lyon - Institut National des Sciences Appliquées de Lyon - Université de Lyon - INSA - Institut National des Sciences Appliquées - UJM - Université Jean Monnet - Saint-Étienne - UJM EPE - Université Jean Monnet (EPSCPE) - CNRS - Centre National de la Recherche Scientifique

Post-Print from HAL

Abstract: This paper deals with two generalizations of the Polya urn model where, instead of sampling one ball from the urn at each time, we sample two or three balls. The processes are defined on the basis of the problem of finding the best alternative using pairwise comparisons which are not necessarily transitive: they can be thought of as evolutionary processes that tend to reinforce currently efficient alternatives. The two processes exhibit different behaviors: with three balls sampled, we prove almost sure convergence towards the unique optimal solution of the comparisons problem while, in some cases, the process with two balls sampled has almost surely no limit. This is an example of a natural reinforcement model with no exchangeability whose asymptotic behavior can be precisely characterized.

Date: 2017
References: Add references at CitEc
Citations: View citations in EconPapers (3)

Published in The Annals of Applied Probability, 2017, 27 (5), pp.2907-2925. ⟨10.1214/16-AAP1271⟩

There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.

Related works:
Working Paper: Reinforcement learning from comparisons: Three alternatives are enough, two are not (2017)
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:hal:journl:halshs-01630231

DOI: 10.1214/16-AAP1271

Access Statistics for this paper

More papers in Post-Print from HAL
Bibliographic data for series maintained by CCSD ().