Mechanisms with learning for stochastic multi-armed bandit problems

Jain, Shweta; Bhat, Satyanath; Ghalme, Ganesh; Padmanabhan, Divya; Narahari, Y.

Mechanisms with learning for stochastic multi-armed bandit problems

Shweta Jain (), Satyanath Bhat (), Ganesh Ghalme (), Divya Padmanabhan () and Y. Narahari ()
Additional contact information
Shweta Jain: Indian Institute of Science
Satyanath Bhat: Indian Institute of Science
Ganesh Ghalme: Indian Institute of Science
Divya Padmanabhan: Indian Institute of Science
Y. Narahari: Indian Institute of Science

Indian Journal of Pure and Applied Mathematics, 2016, vol. 47, issue 2, 229-272

Abstract: Abstract The multi-armed bandit (MAB) problem is a widely studied problem in machine learning literature in the context of online learning. In this article, our focus is on a specific class of problems namely stochastic MAB problems where the rewards are stochastic. In particular, we emphasize stochastic MAB problems with strategic agents. Dealing with strategic agents warrants the use of mechanism design principles in conjunction with online learning, and leads to non-trivial technical challenges. In this paper, we first provide three motivating problems arising from Internet advertising, crowdsourcing, and smart grids. Next, we provide an overview of stochastic MAB problems and key associated learning algorithms including upper confidence bound (UCB) based algorithms. We provide proofs of important results related to regret analysis of the above learning algorithms. Following this, we present mechanism design for stochastic MAB problems. With the classic example of sponsored search auctions as a backdrop, we bring out key insights in important issues such as regret lower bounds, exploration separated mechanisms, designing truthful mechanisms, UCB based mechanisms, and extension to multiple pull MAB problems. Finally we provide a bird’s eye view of recent results in the area and present a few issues that require immediate future attention.

Keywords: Multi-armed Bandit; mechanism design; learning algorithms (search for similar items in EconPapers)
Date: 2016
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s13226-016-0186-3 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:indpam:v:47:y:2016:i:2:d:10.1007_s13226-016-0186-3

Ordering information: This journal article can be ordered from
https://www.springer.com/journal/13226

DOI: 10.1007/s13226-016-0186-3

Access Statistics for this article

Indian Journal of Pure and Applied Mathematics is currently edited by Nidhi Chandhoke

More articles in Indian Journal of Pure and Applied Mathematics from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().