Flexible and Efficient Contextual Bandits with Heterogeneous Treatment Effect Oracles
Aldo Gael Carranza,
Sanath Kumar Krishnamurthy and
Susan Athey
Additional contact information
Aldo Gael Carranza: Stanford U
Sanath Kumar Krishnamurthy: Stanford U
Research Papers from Stanford University, Graduate School of Business
Abstract:
Contextual bandit algorithms often estimate reward models to inform decision-making. However, true rewards can contain action- independent redundancies that are not relevant for decision-making. We show it is more data- efficient to estimate any function that explains the reward differences between actions, that is, the treatment effects. Motivated by this obser- vation, building on recent work on oracle-based bandit algorithms, we provide the first reduction of contextual bandits to general-purpose hetero- geneous treatment effect estimation, and we de- sign a simple and computationally efficient algo- rithm based on this reduction. Our theoretical and experimental results demonstrate that hetero- geneous treatment effect estimation in contextual bandits offers practical advantages over reward estimation, including more efficient model esti- mation and greater flexibility to model misspeci- fication.
Date: 2023-02
New Economics Papers: this item is included in nep-ecm and nep-exp
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www.gsb.stanford.edu/faculty-research/work ... erogeneous-treatment
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:ecl:stabus:4081
Access Statistics for this paper
More papers in Research Papers from Stanford University, Graduate School of Business Contact information at EDIRC.
Bibliographic data for series maintained by (workingpapers@econlit.org).