Polynomial Time Algorithms for Branching Markov Decision Processes and Probabilistic Min(Max) Polynomial Bellman Equations
Kousha Etessami (),
Alistair Stewart () and
Mihalis Yannakakis ()
Additional contact information
Kousha Etessami: School of Informatics, University of Edinburgh, Edinburgh EH8 9AB, United Kingdom;
Alistair Stewart: Department of Computer Science, University of Southern California, Los Angeles, California 90089;
Mihalis Yannakakis: Department of Computer Science, Columbia University, New York, New York 10027
Mathematics of Operations Research, 2020, vol. 45, issue 1, 34–62
Abstract:
We show that one can compute the least nonnegative solution (also known as the least fixed point ) for a system of probabilistic min (max) polynomial equations, to any desired accuracy ɛ > 0 in time polynomial in both the encoding size of the system and in log(1/ ɛ ). These are Bellman optimality equations for important classes of infinite-state Markov decision processes (MDPs), including branching MDPs (BMDPs), which generalize classic multitype branching stochastic processes. We thus obtain the first polynomial time algorithm for computing, to any desired precision, optimal (maximum and minimum) extinction probabilities for BMDPs. Our algorithms are based on a novel generalization of Newton’s method, which employs linear programming in each iteration. We also provide polynomial-time (P-time) algorithms for computing an ɛ -optimal policy for both maximizing and minimizing extinction probabilities in a BMDP, whereas we note a hardness result for computing an exact optimal policy. Furthermore, improving on prior results, we provide more efficient P-time algorithms for qualitative analysis of BMDPs, that is, for determining whether the maximum or minimum extinction probability is 1, and, if so, computing a policy that achieves this. We also observe some complexity consequences of our results for branching simple stochastic games, which generalize BMDPs.
Keywords: multitype branching processes; Markov decision processes; Bellman optimality equations; generalized Newton method; olynomial time algorithms (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://doi.org/10.1287/moor.2018.0970 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:inm:ormoor:v:45:y:2020:i:1:p:34-62
Access Statistics for this article
More articles in Mathematics of Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().