A Practical Guide of Off-Policy Evaluation for Bandit Problems

Kato, Masahiro; Abe, Kenshi; Ariu, Kaito; Yasui, Shota

A Practical Guide of Off-Policy Evaluation for Bandit Problems

Masahiro Kato, Kenshi Abe, Kaito Ariu and Shota Yasui

Abstract: Off-policy evaluation (OPE) is the problem of estimating the value of a target policy from samples obtained via different policies. Recently, applying OPE methods for bandit problems has garnered attention. For the theoretical guarantees of an estimator of the policy value, the OPE methods require various conditions on the target policy and policy used for generating the samples. However, existing studies did not carefully discuss the practical situation where such conditions hold, and the gap between them remains. This paper aims to show new results for bridging the gap. Based on the properties of the evaluation policy, we categorize OPE situations. Then, among practical applications, we mainly discuss the best policy selection. For the situation, we propose a meta-algorithm based on existing OPE estimators. We investigate the proposed concepts using synthetic and open real-world datasets in experiments.

Date: 2020-10
New Economics Papers: this item is included in nep-ecm
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://arxiv.org/pdf/2010.12470 Latest version (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:arx:papers:2010.12470

Access Statistics for this paper

More papers in Papers from arXiv.org
Bibliographic data for series maintained by arXiv administrators ().