Information-directed policy sampling for episodic Bayesian Markov decision processes
Victoria Diaz and
Archis Ghate
IISE Transactions, 2025, vol. 57, issue 8, 905-919
Abstract:
We consider finite-stage Markov Decision Processes (MDPs) under incomplete information, where the decision-maker only knows that the true transition probability and reward matrices belong to given, finite sets. The decision-maker interacts with the system over a finite number of episodes. The first episode begins with a probabilistic belief about the true probability and reward matrices. This belief is updated at the end of each episode using observed events. The goal is to maximize the expected total reward earned over all episodes. In the resulting model-based episodic Bayesian MDP, it suffices to only consider (the known) policies that are optimal to each one of the possible probability and reward matrices. Nevertheless, the decision-maker should execute policies that provide information about the true probabilities and rewards (exploration), but also exploit this knowledge to increase rewards. We propose a framework called Information-Directed Policy Sampling (IDPS). In each episode, the decision-maker balances the exploitation-exploration trade-off by executing a randomized policy that minimizes a so-called convex information ratio. We derive a regret bound that is independent of state- and action-space cardinalities when the set of matrices is exogenously determined. Numerical experiments show IDPS outperforming a state-of-the-art approach called Posterior Sampling.
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
http://hdl.handle.net/10.1080/24725854.2024.2392663 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:taf:uiiexx:v:57:y:2025:i:8:p:905-919
Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/uiie20
DOI: 10.1080/24725854.2024.2392663
Access Statistics for this article
IISE Transactions is currently edited by Jianjun Shi
More articles in IISE Transactions from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().