EconPapers    
Economics at your fingertips  
 

Debiased Off-Policy Evaluation for Recommendation Systems

Yusuke Narita (), Shota Yasui and Kohei Yata

Papers from arXiv.org

Abstract: Efficient methods to evaluate new algorithms are critical for improving interactive bandit and reinforcement learning systems such as recommendation systems. A/B tests are reliable, but are time- and money-consuming, and entail a risk of failure. In this paper, we develop an alternative method, which predicts the performance of algorithms given historical data that may have been generated by a different algorithm. Our estimator has the property that its prediction converges in probability to the true performance of a counterfactual algorithm at a rate of $\sqrt{N}$, as the sample size $N$ increases. We also show a correct way to estimate the variance of our prediction, thus allowing the analyst to quantify the uncertainty in the prediction. These properties hold even when the analyst does not know which among a large number of potentially important state variables are actually important. We validate our method by a simulation experiment about reinforcement learning. We finally apply it to improve advertisement design by a major advertisement company. We find that our method produces smaller mean squared errors than state-of-the-art methods.

Date: 2020-02, Revised 2021-08
New Economics Papers: this item is included in nep-big and nep-cmp
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://arxiv.org/pdf/2002.08536 Latest version (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:arx:papers:2002.08536

Access Statistics for this paper

More papers in Papers from arXiv.org
Bibliographic data for series maintained by arXiv administrators ().

 
Page updated 2025-03-19
Handle: RePEc:arx:papers:2002.08536