Catastrophic-risk-aware reinforcement learning with extreme-value-theory-based policy gradients

Davar, Parisa; Godin, Fr\'ed\'eric; Garrido, Jose

Catastrophic-risk-aware reinforcement learning with extreme-value-theory-based policy gradients

Parisa Davar, Fr\'ed\'eric Godin and Jose Garrido

Abstract: This paper tackles the problem of mitigating catastrophic risk (which is risk with very low frequency but very high severity) in the context of a sequential decision making process. This problem is particularly challenging due to the scarcity of observations in the far tail of the distribution of cumulative costs (negative rewards). A policy gradient algorithm is developed, that we call POTPG. It is based on approximations of the tail risk derived from extreme value theory. Numerical experiments highlight the out-performance of our method over common benchmarks, relying on the empirical distribution. An application to financial risk management, more precisely to the dynamic hedging of a financial option, is presented.

Date: 2024-06, Revised 2024-06
New Economics Papers: this item is included in nep-cmp and nep-rmg
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://arxiv.org/pdf/2406.15612 Latest version (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:arx:papers:2406.15612

Access Statistics for this paper

More papers in Papers from arXiv.org
Bibliographic data for series maintained by arXiv administrators ().