EconPapers    
Economics at your fingertips  
 

A modern Bayesian look at the multi‐armed bandit

Steven L. Scott

Applied Stochastic Models in Business and Industry, 2010, vol. 26, issue 6, 639-658

Abstract: A multi‐armed bandit is an experiment with the goal of accumulating rewards from a payoff distribution with unknown parameters that are to be learned sequentially. This article describes a heuristic for managing multi‐armed bandits called randomized probability matching, which randomly allocates observations to arms according the Bayesian posterior probability that each arm is optimal. Advances in Bayesian computation have made randomized probability matching easy to apply to virtually any payoff distribution. This flexibility frees the experimenter to work with payoff distributions that correspond to certain classical experimental designs that have the potential to outperform methods that are ‘optimal’ in simpler contexts. I summarize the relationships between randomized probability matching and several related heuristics that have been used in the reinforcement learning literature. Copyright © 2010 John Wiley & Sons, Ltd.

Date: 2010
References: Add references at CitEc
Citations: View citations in EconPapers (23)

Downloads: (external link)
https://doi.org/10.1002/asmb.874

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:wly:apsmbi:v:26:y:2010:i:6:p:639-658

Access Statistics for this article

More articles in Applied Stochastic Models in Business and Industry from John Wiley & Sons
Bibliographic data for series maintained by Wiley Content Delivery ().

 
Page updated 2025-03-20
Handle: RePEc:wly:apsmbi:v:26:y:2010:i:6:p:639-658