EconPapers    
Economics at your fingertips  
 

Robust offline reinforcement learning with heavy-tailed rewards

Jin Zhu, Runzhe Wan, Zhengling Qi, Shikai Luo and Chengchun Shi

LSE Research Online Documents on Economics from London School of Economics and Political Science, LSE Library

Abstract: This paper endeavors to augment the robustness of offline reinforcement learning (RL) in scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world applications. We propose two algorithmic frameworks, ROAM and ROOM, for robust off-policy evaluation and offline policy optimization (OPO), respectively. Central to our frameworks is the strategic incorporation of the median-of-means method with offline RL, enabling straightforward uncertainty estimation for the value function estimator. This not only adheres to the principle of pessimism in OPO but also adeptly manages heavytailed rewards. Theoretical results and extensive experiments demonstrate that our two frameworks outperform existing methods on the logged dataset exhibits heavytailed reward distributions. The implementation of the proposal is available at https://github.com/Mamba413/ROOM.

Keywords: Rights; Retention (search for similar items in EconPapers)
JEL-codes: C1 (search for similar items in EconPapers)
Pages: 9 pages
Date: 2024-05-02
References: Add references at CitEc
Citations:

Published in Proceedings of Machine Learning Research, 2, May, 2024, 238, pp. 541 - 549. ISSN: 2640-3498

Downloads: (external link)
http://eprints.lse.ac.uk/122740/ Open access version. (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:ehl:lserod:122740

Access Statistics for this paper

More papers in LSE Research Online Documents on Economics from London School of Economics and Political Science, LSE Library LSE Library Portugal Street London, WC2A 2HD, U.K.. Contact information at EDIRC.
Bibliographic data for series maintained by LSERO Manager ().

 
Page updated 2025-03-31
Handle: RePEc:ehl:lserod:122740