EconPapers    
Economics at your fingertips  
 

Quantile Markov Decision Processes

Xiaocheng Li (), Huaiyang Zhong () and Margaret L. Brandeau ()
Additional contact information
Xiaocheng Li: Department of Management Science and Engineering, Stanford University, Stanford, California 94305
Huaiyang Zhong: Department of Management Science and Engineering, Stanford University, Stanford, California 94305
Margaret L. Brandeau: Department of Management Science and Engineering, Stanford University, Stanford, California 94305

Operations Research, 2022, vol. 70, issue 3, 1428-1447

Abstract: The goal of a traditional Markov decision process (MDP) is to maximize expected cumulative reward over a defined horizon (possibly infinite). In many applications, however, a decision maker may be interested in optimizing a specific quantile of the cumulative reward instead of its expectation. In this paper, we consider the problem of optimizing the quantiles of the cumulative rewards of an MDP, which we refer to as a quantile Markov decision process (QMDP). We provide analytical results characterizing the optimal QMDP value function and present a dynamic programming-based algorithm to solve for the optimal policy. The algorithm also extends to the MDP problem with a conditional value-at-risk objective. We illustrate the practical relevance of our model by evaluating it on an HIV treatment initiation problem, in which patients aim to balance the potential benefits and risks of the treatment.

Keywords: Special Issue: Mathematical Models of Individual and Group Decision Making in Operations Research (in honor of Kenneth Arrow); Markov decision process; dynamic programming; quantile; risk measure; medical decision making (search for similar items in EconPapers)
Date: 2022
References: Add references at CitEc
Citations:

Downloads: (external link)
http://dx.doi.org/10.1287/opre.2021.2123 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:oropre:v:70:y:2022:i:3:p:1428-1447

Access Statistics for this article

More articles in Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().

 
Page updated 2025-03-19
Handle: RePEc:inm:oropre:v:70:y:2022:i:3:p:1428-1447