Simple Bayesian Algorithms for Best-Arm Identification
Daniel Russo ()
Additional contact information
Daniel Russo: Columbia University, New York, New York 10027
Operations Research, 2020, vol. 68, issue 6, 1625-1647
Abstract:
This paper considers the optimal adaptive allocation of measurement effort for identifying the best among a finite set of options or designs. An experimenter sequentially chooses designs to measure and observes noisy signals of their quality with the goal of confidently identifying the best design after a small number of measurements. This paper proposes three simple and intuitive Bayesian algorithms for adaptively allocating measurement effort and formalizes a sense in which these seemingly naive rules are the best possible. One proposal is top-two probability sampling, which computes the two designs with the highest posterior probability of being optimal and then randomizes to select among these two. One is a variant of top-two sampling that considers not only the probability that a design is optimal, but the expected amount by which its quality exceeds that of other designs. The final algorithm is a modified version of Thompson sampling that is tailored for identifying the best design. We prove that these simple algorithms satisfy a sharp optimality property. In a frequentist setting where the true quality of the designs is fixed, one hopes that the posterior definitively identifies the optimal design, in the sense that that the posterior probability assigned to the event that some other design is optimal converges to zero as measurements are collected. We show that under the proposed algorithms, this convergence occurs at an exponential rate, and the corresponding exponent is the best possible among all allocation rules. It should be highlighted that the proposed algorithms depend on a single tuning parameter, which determines the probability used when randomizing among the top-two designs. Attaining the optimal rate of posterior convergence requires either that this parameter is set optimally or is tuned adaptively toward the optimal value. The paper goes further, characterizing the exponent attained on any problem instance and for any value of the tunable parameter. This exponent is interpreted as being optimal among a constrained class of allocation rules. Finally, considerable robustness to this parameter is established through numerical experiments and theoretical results. When this parameter is set to 1/2, the exponent attained is within a factor of 2 of best possible across all problem instances.
Keywords: multiarmed bandit; ranking and selection; Bayesian (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (16)
Downloads: (external link)
https://doi.org/10.1287/opre.2019.1911 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:inm:oropre:v:68:y:2020:i:6:p:1625-1647
Access Statistics for this article
More articles in Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().