On the Existence of Fixed Points for Approximate Value Iteration and Temporal-Difference Learning
D. P. De Farias and
B. Van Roy
Additional contact information
D. P. De Farias: Stanford University
B. Van Roy: Stanford University
Journal of Optimization Theory and Applications, 2000, vol. 105, issue 3, No 7, 589-608
Abstract:
Abstract Approximate value iteration is a simple algorithm that combats the curse of dimensionality in dynamic programs by approximating iterates of the classical value iteration algorithm in a spirit reminiscent of statistical regression. Each iteration of this algorithm can be viewed as an application of a modified dynamic programming operator to the current iterate. The hope is that the iterates converge to a fixed point of this operator, which will then serve as a useful approximation of the optimal value function. In this paper, we show that, in general, the modified dynamic programming operator need not possess a fixed point; therefore, approximate value iteration should not be expected to converge. We then propose a variant of approximate value iteration for which the associated operator is guaranteed to possess at least one fixed point. This variant is motivated by studies of temporal-difference (TD) learning, and existence of fixed points implies here existence of stationary points for the ordinary differential equation approximated by a version of TD that incorporates exploration.
Keywords: dynamic programming; neurodynamic programming; reinforcement learning; temporal-difference learning; value iteration (search for similar items in EconPapers)
Date: 2000
References: View complete reference list from CitEc
Citations: View citations in EconPapers (5)
Downloads: (external link)
http://link.springer.com/10.1023/A:1004641123405 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:joptap:v:105:y:2000:i:3:d:10.1023_a:1004641123405
Ordering information: This journal article can be ordered from
http://www.springer. ... cs/journal/10957/PS2
DOI: 10.1023/A:1004641123405
Access Statistics for this article
Journal of Optimization Theory and Applications is currently edited by Franco Giannessi and David G. Hull
More articles in Journal of Optimization Theory and Applications from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().