Robust offline reinforcement learning with heavy-tailed rewards
Jin Zhu,
Runzhe Wan,
Zhengling Qi,
Shikai Luo and
Chengchun Shi
LSE Research Online Documents on Economics from London School of Economics and Political Science, LSE Library
Abstract:
This paper endeavors to augment the robustness of offline reinforcement learning (RL) in scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world applications. We propose two algorithmic frameworks, ROAM and ROOM, for robust off-policy evaluation and offline policy optimization (OPO), respectively. Central to our frameworks is the strategic incorporation of the median-of-means method with offline RL, enabling straightforward uncertainty estimation for the value function estimator. This not only adheres to the principle of pessimism in OPO but also adeptly manages heavytailed rewards. Theoretical results and extensive experiments demonstrate that our two frameworks outperform existing methods on the logged dataset exhibits heavytailed reward distributions. The implementation of the proposal is available at https://github.com/Mamba413/ROOM.
Keywords: Rights; Retention (search for similar items in EconPapers)
JEL-codes: C1 (search for similar items in EconPapers)
Pages: 9 pages
Date: 2024-05-02
References: Add references at CitEc
Citations:
Published in Proceedings of Machine Learning Research, 2, May, 2024, 238, pp. 541 - 549. ISSN: 2640-3498
Downloads: (external link)
http://eprints.lse.ac.uk/122740/ Open access version. (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:ehl:lserod:122740
Access Statistics for this paper
More papers in LSE Research Online Documents on Economics from London School of Economics and Political Science, LSE Library LSE Library Portugal Street London, WC2A 2HD, U.K.. Contact information at EDIRC.
Bibliographic data for series maintained by LSERO Manager ().