Safely Exploring Novel Actions in Recommender Systems via Deployment-Efficient Policy Learning
Haruka Kiyohara, 
Yusuke Narita, 
Yuta Saito, 
Kei Tateno and 
Takuma Udagawa
Additional contact information 
Haruka Kiyohara: Cornell University
Yusuke Narita: Yale University
Yuta Saito: Hanjuku-kaso Co., Ltd.
Kei Tateno: Sony Group Corporation
Takuma Udagawa: Sony Group Corporation
No 2466, Cowles Foundation Discussion Papers from  Cowles Foundation for Research in Economics, Yale University
Abstract:
In many real recommender systems, novel items are added frequently over time. The importance of sufficiently presenting novel actions has widely been acknowledged for improving long-term user engagement. A recent work builds on Off-Policy Learning (OPL), which trains a policy from only logged data, however, the existing methods can be unsafe in the presence of novel actions. Our goal is to develop a framework to enforce exploration of novel actions with a guarantee for safety. To this end, we first develop Safe Off-Policy Policy Gradient (Safe OPG), which is a model-free safe OPL method based on a high confidence off-policy evaluation. In our first experiment, we observe that Safe OPG almost always satisfies a safety requirement, even when existing methods violate it greatly. However, the result also reveals that Safe OPG tends to be too conservative, suggesting a difficult tradeoff between guaranteeing safety and exploring novel actions. To overcome this tradeoff, we also propose a novel framework called Deployment-Efficient Policy Learning for Safe User Exploration, which leverages safety margin and gradually relaxes safety regularization during multiple (not many) deployments. Our framework thus enables exploration of novel actions while guaranteeing safe implementation of recommender systems.
Pages: 17 pages
Date: 2025-10-09
References: Add references at CitEc 
Citations: 
Downloads: (external link)
https://cowles.yale.edu/sites/default/files/2025-10/d2466.pdf (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX 
RIS (EndNote, ProCite, RefMan) 
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:cwl:cwldpp:2466
Ordering information: This working paper can be ordered from
Cowles Foundation, Yale University, Box 208281, New Haven, CT 06520-8281 USA
The price is None.
Access Statistics for this paper
More papers in Cowles Foundation Discussion Papers  from  Cowles Foundation for Research in Economics, Yale University Yale University, Box 208281, New Haven, CT 06520-8281 USA. Contact information at EDIRC.
Bibliographic data for series maintained by Brittany Ladd ().