EconPapers    
Economics at your fingertips  
 

Learning to Optimize via Information-Directed Sampling

Daniel Russo () and Benjamin Van Roy ()
Additional contact information
Daniel Russo: Graduate School of Business, Columbia University, New York, New York 10027
Benjamin Van Roy: Stanford University, Stanford, California 94305

Operations Research, 2018, vol. 66, issue 1, 230-252

Abstract: We propose information-directed sampling —a new approach to online optimization problems in which a decision maker must balance between exploration and exploitation while learning from partial feedback. Each action is sampled in a manner that minimizes the ratio between squared expected single-period regret and a measure of information gain: the mutual information between the optimal action and the next observation. We establish an expected regret bound for information-directed sampling that applies across a very general class of models and scales with the entropy of the optimal action distribution. We illustrate through simple analytic examples how information-directed sampling accounts for kinds of information that alternative approaches do not adequately address and that this can lead to dramatic performance gains. For the widely studied Bernoulli, Gaussian, and linear bandit problems, we demonstrate state-of-the-art simulation performance. The electronic companion is available at https://doi.org/10.1287/opre.2017.1663 .

Keywords: online optimization; multi-armed bandit; exploration/exploitation; information theory (search for similar items in EconPapers)
Date: 2018
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (4)

Downloads: (external link)
https://doi.org/10.1287/opre.2017.1663 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:oropre:v:66:y:2018:i:1:p:230-252

Access Statistics for this article

More articles in Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().

 
Page updated 2025-03-19
Handle: RePEc:inm:oropre:v:66:y:2018:i:1:p:230-252