Mean-Variance Tradeoffs in an Undiscounted MDP
Matthew J. Sobel
Additional contact information
Matthew J. Sobel: State University of New York at Stony Brook, Stony Brook, New York
Operations Research, 1994, vol. 42, issue 1, 175-183
Abstract:
A stationary policy and an initial state in an MDP (Markov decision process) induce a stationary probability distribution of the reward. The problem analyzed here is generating the Pareto optima in the sense of high mean and low variance of the stationary distribution. In the unichain case, Pareto optima can be computed either with policy improvement or with a linear program having the same number of variables and one more constraint than the formulation for gain-rate optimization. The same linear program suffices in the multichain case if the ergodic class is an element of choice.
Keywords: dynamic programming; Markov: mean-variance tradeoff; programming; multiple criteria: mean-variance tradeoff (search for similar items in EconPapers)
Date: 1994
References: Add references at CitEc
Citations: View citations in EconPapers (6)
Downloads: (external link)
http://dx.doi.org/10.1287/opre.42.1.175 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:inm:oropre:v:42:y:1994:i:1:p:175-183
Access Statistics for this article
More articles in Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().