Randomized Linear Programming Solves the Markov Decision Problem in Nearly Linear (Sometimes Sublinear) Time
Mengdi Wang ()
Additional contact information
Mengdi Wang: Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08540
Mathematics of Operations Research, 2020, vol. 45, issue 2, 517-546
Abstract:
We propose a novel randomized linear programming algorithm for approximating the optimal policy of the discounted-reward and average-reward Markov decision problems. By leveraging the value–policy duality, the algorithm adaptively samples state–action–state transitions and makes exponentiated primal–dual updates. We show that it finds an ɛ -optimal policy using nearly linear runtime in the worst case for a fixed value of the discount factor. When the Markov decision process is ergodic and specified in some special data formats, for fixed values of certain ergodicity parameters, the algorithm finds an ɛ -optimal policy using sample size and time linear in the total number of state–action pairs, which is sublinear in the input size. These results provide a new venue and complexity benchmarks for solving stochastic dynamic programs.
Keywords: Markov decision process; randomized algorithm; linear programming; duality; primal–dual method; runtime complexity; stochastic approximation (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://doi.org/10.1287/moor.2019.1000 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:inm:ormoor:v:45:y:2020:i:2:p:517-546
Access Statistics for this article
More articles in Mathematics of Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().