Integrating Risk-Averse and Constrained Reinforcement Learning for Robust Decision-Making in High-Stakes Scenarios

Ahmad, Moiz; Ramzan, Muhammad Babar; Omair, Muhammad; Habib, Muhammad Salman

Integrating Risk-Averse and Constrained Reinforcement Learning for Robust Decision-Making in High-Stakes Scenarios

Moiz Ahmad, Muhammad Babar Ramzan, Muhammad Omair and Muhammad Salman Habib ()
Additional contact information
Moiz Ahmad: Department of Industrial and Manufacturing Engineering, University of Engineering and Technology, Lahore 54700, Pakistan
Muhammad Babar Ramzan: School of Engineering and Technology, National Textile University, Faisalabad 37610, Pakistan
Muhammad Omair: Department of Materials and Production, Aalborg University, 9220 Aalborg Øst, Denmark
Muhammad Salman Habib: Institute of Knowledge Services, Center for Creative Convergence Education, Hanyang University ERICA Campus, Ansan-si 15588, Gyeonggi-do, Republic of Korea

Mathematics, 2024, vol. 12, issue 13, 1-32

Abstract: This paper considers a risk-averse Markov decision process (MDP) with non-risk constraints as a dynamic optimization framework to ensure robustness against unfavorable outcomes in high-stakes sequential decision-making situations such as disaster response. In this regard, strong duality is proved while making no assumptions on the problem’s convexity. This is necessary for some real-world issues, e.g., in the case of deprivation costs in the context of disaster relief, where convexity cannot be ensured. Our theoretical results imply that the problem can be exactly solved in a dual domain where it becomes convex. Based on our duality results, an augmented Lagrangian-based constraint handling mechanism is also developed for risk-averse reinforcement learning algorithms. The mechanism is proved to be theoretically convergent. Finally, we have also empirically established the convergence of the mechanism using a multi-stage disaster response relief allocation problem while using a fixed negative reward scheme as a benchmark.

Keywords: robust decision-making; dynamic decision-making; non-convexities; constrained reinforcement learning; augmented Lagrangian; Markov risk (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/12/13/1954/pdf (application/pdf)
https://www.mdpi.com/2227-7390/12/13/1954/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:12:y:2024:i:13:p:1954-:d:1420914

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().