A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning

Thi, Hoai An Le; Ho, Vinh Thanh; Dinh, Tao Pham

A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning

Hoai An Le Thi (), Vinh Thanh Ho () and Tao Pham Dinh ()
Additional contact information
Hoai An Le Thi: University of Lorraine
Vinh Thanh Ho: University of Lorraine
Tao Pham Dinh: University of Normandie

Journal of Global Optimization, 2019, vol. 73, issue 2, No 2, 279-310

Abstract: Abstract We investigate a powerful nonconvex optimization approach based on Difference of Convex functions (DC) programming and DC Algorithm (DCA) for reinforcement learning, a general class of machine learning techniques which aims to estimate the optimal learning policy in a dynamic environment typically formulated as a Markov decision process (with an incomplete model). The problem is tackled as finding the zero of the so-called optimal Bellman residual via the linear value-function approximation for which two optimization models are proposed: minimizing the $$\ell _{p}$$ ℓ p -norm of a vector-valued convex function, and minimizing a concave function under linear constraints. They are all formulated as DC programs for which attractive DCA schemes are developed. Numerical experiments on various examples of the two benchmarks of Markov decision process problems—Garnet and Gridworld problems, show the efficiency of our approaches in comparison with two existing DCA based algorithms and two state-of-the-art reinforcement learning algorithms.

Keywords: Batch reinforcement learning; Markov decision process; DC programming; DCA; Optimal Bellman residual (search for similar items in EconPapers)
Date: 2019
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
http://link.springer.com/10.1007/s10898-018-0698-y Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:jglopt:v:73:y:2019:i:2:d:10.1007_s10898-018-0698-y

Ordering information: This journal article can be ordered from
http://www.springer. ... search/journal/10898

DOI: 10.1007/s10898-018-0698-y

Access Statistics for this article

Journal of Global Optimization is currently edited by Sergiy Butenko

More articles in Journal of Global Optimization from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().