A Comparison of Policy Iteration Methods for Solving Continuous-State, Infinite-Horizon Markovian Decision Problems Using Random, Quasi-random, and Deterministic Discretizations
Abstract:
This paper compares the performance of the Howard (1960) policy iteration algorithm for infinite-horizon continuous-state Markovian decision processes (MDP's) using alternative random, quasi- random, and deterministic discretizations of the state space, or grids. Each grid corresponds to an embedded finite state MDP whose solution is used to approximate the solution to the original continuous-state Markovian decision process. I extend a result of Rust (1997), to show that policy iteration using random grids succeeds in breaking the curse of dimensionality involved in approximating the solution to a class of continuous-state discrete-action MDP's known as discrete decision processes (DDP's). I compare this ``random policy iteration algorithm'' (RPI) with policy iteration algorithms using deterministically chosen grids including uniform grids and quadrature grids both of which are subject to the curse of dimensionality. I also compare the RPI algorithm to deterministic policy iteration algorithms based on quasi-random or `low discrepancy grids' such as the Sobol' and Tezuka sequences.