Robust Exploratory Stopping under Ambiguity in Reinforcement Learning
Junyan Ye,
Hoi Ying Wong and
Kyunghyun Park
Papers from arXiv.org
Abstract:
We propose and analyze a continuous-time robust reinforcement learning framework for optimal stopping under ambiguity. In this framework, an agent chooses a robust exploratory stopping time motivated by two objectives: robust decision-making under ambiguity and learning about the unknown environment. Here, ambiguity refers to considering multiple probability measures dominated by a reference measure, reflecting the agent's awareness that the reference measure representing her learned belief about the environment would be erroneous. Using the $g$-expectation framework, we reformulate the optimal stopping problem under ambiguity as a robust exploratory control problem with Bernoulli distributed controls. We then characterize the optimal Bernoulli distributed control via backward stochastic differential equations and, based on this, construct the robust exploratory stopping time that approximates the optimal stopping time under ambiguity. Last, we establish a policy iteration theorem and implement it as a reinforcement learning algorithm. Numerical experiments demonstrate the convergence, robustness, and scalability of our reinforcement learning algorithm across different levels of ambiguity and exploration.
Date: 2025-10, Revised 2026-04
New Economics Papers: this item is included in nep-cmp
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://arxiv.org/pdf/2510.10260 Latest version (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:arx:papers:2510.10260
Access Statistics for this paper
More papers in Papers from arXiv.org
Bibliographic data for series maintained by arXiv administrators ().