Off-policy Evaluation with General Logging Policies: Implementation at Mercari
Yusuke Narita,
Kyohei Okumura,
Akihiro Shimizu and
Kohei Yata
Discussion papers from Research Institute of Economy, Trade and Industry (RIETI)
Abstract:
Off-policy evaluation (OPE) attempts to predict the performance of counterfactual policies using log data from a different policy. We extend its applicability by developing an OPE method for a class of both full support and deficient support logging policies in contextual-bandit settings. This class includes deterministic bandit (such as Upper Confidence Bound) as well as deterministic decision-making based on supervised and unsupervised learning. We prove that our method's prediction converges in probability to the true performance of a counterfactual policy as the sample size increases. We validate our method with experiments on partly and entirely deterministic logging policies. Finally, we apply it to evaluate coupon targeting policies by a major online platform and show how to improve the existing policy.
Pages: 26 pages
Date: 2022-10
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.rieti.go.jp/jp/publications/dp/22e097.pdf (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:eti:dpaper:22097
Access Statistics for this paper
More papers in Discussion papers from Research Institute of Economy, Trade and Industry (RIETI) Contact information at EDIRC.
Bibliographic data for series maintained by TANIMOTO, Toko ().