Policy Choice and Best Arm Identification: Asymptotic Analysis of Exploration Sampling

Ariu, Kaito; Kato, Masahiro; Komiyama, Junpei; McAlinn, Kenichiro; Qin, Chao

Policy Choice and Best Arm Identification: Asymptotic Analysis of Exploration Sampling

Kaito Ariu, Masahiro Kato, Junpei Komiyama, Kenichiro McAlinn and Chao Qin

Abstract: We consider the "policy choice" problem -- otherwise known as best arm identification in the bandit literature -- proposed by Kasy and Sautmann (2021) for adaptive experimental design. Theorem 1 of Kasy and Sautmann (2021) provides three asymptotic results that give theoretical guarantees for exploration sampling developed for this setting. We first show that the proof of Theorem 1 (1) has technical issues, and the proof and statement of Theorem 1 (2) are incorrect. We then show, through a counterexample, that Theorem 1 (3) is false. For the former two, we correct the statements and provide rigorous proofs. For Theorem 1 (3), we propose an alternative objective function, which we call posterior weighted policy regret, and derive the asymptotic optimality of exploration sampling.

Date: 2021-09, Revised 2021-11
New Economics Papers: this item is included in nep-isf
References: Add references at CitEc
Citations: View citations in EconPapers (8)

Downloads: (external link)
http://arxiv.org/pdf/2109.08229 Latest version (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:arx:papers:2109.08229

Access Statistics for this paper

More papers in Papers from arXiv.org
Bibliographic data for series maintained by arXiv administrators ().