Technical Note—The Elliptical Potential Lemma for General Distributions with an Application to Linear Thompson Sampling

Hamidi, Nima; Bayati, Mohsen

Technical Note—The Elliptical Potential Lemma for General Distributions with an Application to Linear Thompson Sampling

Nima Hamidi () and Mohsen Bayati ()
Additional contact information
Nima Hamidi: Department of Statistics, Stanford University, Stanford, California 94305
Mohsen Bayati: Operations, Information, and Technology, Graduate School of Business, Stanford University, Stanford, California 94305

Operations Research, 2023, vol. 71, issue 4, 1434-1439

Abstract: In this note, we introduce a general version of the well-known elliptical potential lemma that is a widely used technique in the analysis of algorithms in sequential learning and decision-making problems. We consider a stochastic linear bandit setting where decision makers sequentially choose among a set of given actions, observe their noisy rewards, and aim to maximize their cumulative expected reward over a decision-making horizon. The elliptical potential lemma is a key tool for quantifying uncertainty in estimating parameters of the reward function, but it requires the noise and the prior distributions to be Gaussian. Our general elliptical potential lemma relaxes this Gaussian requirement, which is a highly nontrivial extension for a number of reasons; unlike the Gaussian case, there is no closed-form solution for the covariance matrix of the posterior distribution, the covariance matrix is not a deterministic function of the actions, and the covariance matrix is not decreasing with respect to the semidefinite inequality. Although this result is of broad interest, we showcase an application of it to prove an improved Bayesian regret bound for the well-known Thompson sampling algorithm in stochastic linear bandits with changing action sets where prior and noise distributions are general. This bound is minimax optimal up to constants.

Keywords: Stochastic Models; elliptical potential lemma; stochastic linear bandit; Thompson sampling (search for similar items in EconPapers)
Date: 2023
References: Add references at CitEc
Citations:

Downloads: (external link)
http://dx.doi.org/10.1287/opre.2022.2274 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:oropre:v:71:y:2023:i:4:p:1434-1439

Access Statistics for this article

More articles in Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().