Adapting to Misspecification in Contextual Bandits with Offline Regression Oracles
Sanath Kumar Krishnamurthy,
Vitor Hadad and
Susan Athey
Additional contact information
Sanath Kumar Krishnamurthy: Stanford University
Vitor Hadad: Stanford University
Research Papers from Stanford University, Graduate School of Business
Abstract:
Computationally efficient contextual bandits are often based on estimating a predictive model of rewards given contexts and arms using past data. However, when the reward model is not well-specified, the bandit algorithm may incur unexpected regret, so recent work has focused on algorithms that are robust to misspecification. We propose a simple family of contextual bandit algorithms that adapt to misspecification error by reverting to a good safe policy when there is evidence that misspecification is causing a regret increase. Our algorithm requires only an offline regression oracle to ensure regret guarantees that gracefully degrade in terms of a measure of the average misspecification level. Compared to prior work, we attain similar regret guarantees, but we do not rely on a master algorithm, and do not require more robust oracles like online or constrained regression oracles [e.g., Foster et al. (2020a); Krishnamurthy et al. (2020)]. This allows us to design algorithms for more general function approximation classes.
Date: 2021-02
New Economics Papers: this item is included in nep-ecm
References: Add references at CitEc
Citations: View citations in EconPapers (2)
Downloads: (external link)
https://www.gsb.stanford.edu/faculty-research/work ... s-offline-regression
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:ecl:stabus:3951
Access Statistics for this paper
More papers in Research Papers from Stanford University, Graduate School of Business Contact information at EDIRC.
Bibliographic data for series maintained by ().