A Linear Programming Approach to Nonstationary Infinite-Horizon Markov Decision Processes

Ghate, Archis; Smith, Robert L.

A Linear Programming Approach to Nonstationary Infinite-Horizon Markov Decision Processes

Archis Ghate () and Robert L. Smith ()
Additional contact information
Archis Ghate: Industrial and Systems Engineering, University of Washington, Seattle, Washington 98195
Robert L. Smith: Industrial and Operations Engineering, University of Michigan, Ann Arbor, Michigan 48109

Operations Research, 2013, vol. 61, issue 2, 413-425

Abstract: Nonstationary infinite-horizon Markov decision processes (MDPs) generalize the most well-studied class of sequential decision models in operations research, namely, that of stationary MDPs, by relaxing the restrictive assumption that problem data do not change over time. Linear programming (LP) has been very successful in obtaining structural insights and devising solution methods for stationary MDPs. However, an LP approach for nonstationary MDPs is currently missing. This is because the LP formulation of a nonstationary infinite-horizon MDP includes countably infinite variables and constraints, and research on such infinite-dimensional LPs has traditionally faced several hurdles. For instance, duality results may not hold; an extreme point may not be a basic feasible solution; and in the context of a simplex algorithm, a pivot operation may require infinite data and computations, and a sequence of improving extreme points need not converge in value to optimal. In this paper, we tackle these challenges and establish (1) weak and strong duality, (2) complementary slackness, (3) a basic feasible solution characterization of extreme points, (4) a one-to-one correspondence between extreme points and deterministic Markovian policies, and (5) we devise a simplex algorithm for an infinite-dimensional LP formulation of nonstationary infinite-horizon MDPs. Pivots in this simplex algorithm use finite data, perform finite computations, and generate a sequence of improving extreme points that converges in value to optimal. Moreover, this sequence of extreme points gets arbitrarily close to the set of optimal extreme points. We also prove that decisions prescribed by these extreme points are eventually exactly optimal in all states of the nonstationary infinite-horizon MDP in early periods.

Keywords: simplex algorithm; infinite linear programs; dynamic programming (search for similar items in EconPapers)
Date: 2013
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (5)

Downloads: (external link)
http://dx.doi.org/10.1287/opre.1120.1121 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:oropre:v:61:y:2013:i:2:p:413-425

Access Statistics for this article

More articles in Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().