Solving Markov Decision Processes via Simulation

Gosavi, Abhijit

Solving Markov Decision Processes via Simulation

Abhijit Gosavi ()
Additional contact information
Abhijit Gosavi: Missouri University of Science and Technology

Chapter Chapter 13 in Handbook of Simulation Optimization, 2015, pp 341-379 from Springer

Abstract: Abstract This chapter presents an overview of simulation-based techniques useful for solving Markov decision processes (MDPs). MDPs model problems of sequential decision-making under uncertainty, in which decisions made in each state collectively affect the trajectory of the states visited by the system over a time horizon of interest. Traditionally, MDPs have been solved via dynamic programming (DP), which requires the transition probability model that is difficult to derive in many realistic settings. The use of simulation for solving MDPs allows us to bypass the transition probability model and solve large-scale MDPs considered intractable to solve by traditional DP. The simulation-based methodology for solving MDPs, which like DP is also rooted in the Bellman equations, goes by names such as reinforcement learning, neuro-DP, and approximate or adaptive DP. We begin with a description of algorithms for infinite-horizon discounted reward MDPs, followed by the same for infinite-horizon average reward MDPs. Then we present a discussion on finite-horizon MDPs. For each problem considered, we present a step-by-step description of a selected group of algorithms. In making this selection, we have attempted to blend the old and the classical with more recent developments. Finally, after touching upon extensions and convergence theory, we conclude with a brief summary of some applications and directions for future research.

Keywords: Reinforcement Learning; Preventive Maintenance; Bellman Equation; Policy Iteration; Average Reward (search for similar items in EconPapers)
Date: 2015
References: Add references at CitEc
Citations:

There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:isochp:978-1-4939-1384-8_13

Ordering information: This item can be ordered from
http://www.springer.com/9781493913848

DOI: 10.1007/978-1-4939-1384-8_13

Access Statistics for this chapter

More chapters in International Series in Operations Research & Management Science from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().