On the Taylor Expansion of Value Functions

Braverman, Anton; Gurvich, Itai; Huang, Junfei

On the Taylor Expansion of Value Functions

Anton Braverman (), Itai Gurvich () and Junfei Huang ()
Additional contact information
Anton Braverman: Kellogg School of Management, Northwestern University, Evanston, Illinois 60208
Itai Gurvich: Cornell School of Operations Research and Information Engineering and Cornell Tech, New York, New York 10044
Junfei Huang: Department of Decision Sciences and Managerial Economics, CUHK Business School, Chinese University of Hong Kong, Shatin, Hong Kong

Operations Research, 2020, vol. 68, issue 2, 631-654

Abstract: We introduce a framework for approximate dynamic programming that we apply to discrete-time chains on ℤ + d with countable action sets. The framework is grounded in the approximation of the (controlled) chain’s generator by that of another Markov process. In simple terms, our approach stipulates applying a second-order Taylor expansion to the value function, replacing the Bellman equation with one in continuous space and time in which the transition matrix is reduced to its first and second moments. In some cases, the resulting equation can be interpreted as a Hamilton–Jacobi–Bellman equation for a Brownian control problem. When tractable, the “Taylored” equation serves as a useful modeling tool. More generally, it is a starting point for approximation algorithms. We develop bounds on the optimality gap—the suboptimality introduced by using the control produced by the Taylored equation. These bounds can be viewed as a conceptual underpinning, analytical rather than relying on weak convergence arguments, for the good performance of controls derived from Brownian approximations. We prove that under suitable conditions and for suitably “large” initial states, (1) the optimality gap is smaller than a 1 – α fraction of the optimal value, with which α ∈ (0, 1) is the discount factor, and (2) the gap can be further expressed as the infinite-horizon discounted value with a “lower-order” per-period reward. Computationally, our framework leads to an “aggregation” approach with performance guarantees. Although the guarantees are grounded in partial differential equation theory, the practical use of this approach requires no knowledge of that theory.

Keywords: Taylor expansion; Markov decision process; Hamilton–Jacobi–Bellman equation; approximate dynamic programming; approximate policy iteration (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://doi.org/10.1287/opre.2019.1903 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:oropre:v:68:y:2020:i:2:p:631-654

Access Statistics for this article

More articles in Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().