Policy Choice and Best Arm Identification: Asymptotic Analysis of Exploration Sampling
Kaito Ariu,
Masahiro Kato,
Junpei Komiyama,
Kenichiro McAlinn and
Chao Qin
Papers from arXiv.org
Abstract:
We consider the "policy choice" problem -- otherwise known as best arm identification in the bandit literature -- proposed by Kasy and Sautmann (2021) for adaptive experimental design. Theorem 1 of Kasy and Sautmann (2021) provides three asymptotic results that give theoretical guarantees for exploration sampling developed for this setting. We first show that the proof of Theorem 1 (1) has technical issues, and the proof and statement of Theorem 1 (2) are incorrect. We then show, through a counterexample, that Theorem 1 (3) is false. For the former two, we correct the statements and provide rigorous proofs. For Theorem 1 (3), we propose an alternative objective function, which we call posterior weighted policy regret, and derive the asymptotic optimality of exploration sampling.
Date: 2021-09, Revised 2021-11
New Economics Papers: this item is included in nep-isf
References: Add references at CitEc
Citations: View citations in EconPapers (8)
Downloads: (external link)
http://arxiv.org/pdf/2109.08229 Latest version (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:arx:papers:2109.08229
Access Statistics for this paper
More papers in Papers from arXiv.org
Bibliographic data for series maintained by arXiv administrators ().