A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning
Hoai An Le Thi (),
Vinh Thanh Ho () and
Tao Pham Dinh ()
Additional contact information
Hoai An Le Thi: University of Lorraine
Vinh Thanh Ho: University of Lorraine
Tao Pham Dinh: University of Normandie
Journal of Global Optimization, 2019, vol. 73, issue 2, No 2, 279-310
Abstract:
Abstract We investigate a powerful nonconvex optimization approach based on Difference of Convex functions (DC) programming and DC Algorithm (DCA) for reinforcement learning, a general class of machine learning techniques which aims to estimate the optimal learning policy in a dynamic environment typically formulated as a Markov decision process (with an incomplete model). The problem is tackled as finding the zero of the so-called optimal Bellman residual via the linear value-function approximation for which two optimization models are proposed: minimizing the $$\ell _{p}$$ ℓ p -norm of a vector-valued convex function, and minimizing a concave function under linear constraints. They are all formulated as DC programs for which attractive DCA schemes are developed. Numerical experiments on various examples of the two benchmarks of Markov decision process problems—Garnet and Gridworld problems, show the efficiency of our approaches in comparison with two existing DCA based algorithms and two state-of-the-art reinforcement learning algorithms.
Keywords: Batch reinforcement learning; Markov decision process; DC programming; DCA; Optimal Bellman residual (search for similar items in EconPapers)
Date: 2019
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
http://link.springer.com/10.1007/s10898-018-0698-y Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:jglopt:v:73:y:2019:i:2:d:10.1007_s10898-018-0698-y
Ordering information: This journal article can be ordered from
http://www.springer. ... search/journal/10898
DOI: 10.1007/s10898-018-0698-y
Access Statistics for this article
Journal of Global Optimization is currently edited by Sergiy Butenko
More articles in Journal of Global Optimization from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().