Bandit bounds from stochastic variability extrema
Stephen J. Herschkorn
Statistics & Probability Letters, 1997, vol. 35, issue 3, 283-288
Abstract:
In the consideration of bandit problems with general rewards and discount sequences, we compare an arm to one whose reward distribution may be one of two degenerate distributions. For the general multi-armed case, the latter problem provides an upper bound on the optimal return. In the case of two arms with the second known and regular discounting, consideration of the two-point distribution provides a sufficient condition for stopping. We interpret these results in the context of the value of information. The results, and others in the literature, suggest that bandit thresholds (or indices) may be monotonic with respect to ordering of distributions in the convex sense.
Keywords: Bandit; problems; stochastic; variability; ordering; convex; ordering; value; of; information (search for similar items in EconPapers)
Date: 1997
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0167-7152(97)00024-2
Full text for ScienceDirect subscribers only
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:eee:stapro:v:35:y:1997:i:3:p:283-288
Ordering information: This journal article can be ordered from
http://www.elsevier.com/wps/find/supportfaq.cws_home/regional
https://shop.elsevie ... _01_ooc_1&version=01
Access Statistics for this article
Statistics & Probability Letters is currently edited by Somnath Datta and Hira L. Koul
More articles in Statistics & Probability Letters from Elsevier
Bibliographic data for series maintained by Catherine Liu ().