COMBINING CORRELATION-BASED AND REWARD-BASED LEARNING IN NEURAL CONTROL FOR POLICY IMPROVEMENT

Manoonpong, Poramate; Kolodziejski, Christoph; Wörgötter, Florentin; Morimoto, Jun

COMBINING CORRELATION-BASED AND REWARD-BASED LEARNING IN NEURAL CONTROL FOR POLICY IMPROVEMENT

Poramate Manoonpong (), Christoph Kolodziejski (), Florentin Wörgötter () and Jun Morimoto ()
Additional contact information
Poramate Manoonpong: Bernstein Center for Computational Neuroscience, The Third Institute of Physics, University of Göttingen, Göttingen 37077, Germany;
Christoph Kolodziejski: Bernstein Center for Computational Neuroscience, The Third Institute of Physics, University of Göttingen, Göttingen 37077, Germany
Florentin Wörgötter: Bernstein Center for Computational Neuroscience, The Third Institute of Physics, University of Göttingen, Göttingen 37077, Germany
Jun Morimoto: Bernstein Center for Computational Neuroscience, The Third Institute of Physics, University of Göttingen, Göttingen 37077, Germany;

Advances in Complex Systems (ACS), 2013, vol. 16, issue 02n03, 1-38

Abstract: Classical conditioning (conventionally modeled as correlation-based learning) and operant conditioning (conventionally modeled as reinforcement learning or reward-based learning) have been found in biological systems. Evidence shows that these two mechanisms strongly involve learning about associations. Based on these biological findings, we propose a new learning model to achieve successful control policies for artificial systems. This model combines correlation-based learning using input correlation learning (ICO learning) and reward-based learning using continuous actor–critic reinforcement learning (RL), thereby working as a dual learner system. The model performance is evaluated by simulations of a cart-pole system as a dynamic motion control problem and a mobile robot system as a goal-directed behavior control problem. Results show that the model can strongly improve pole balancing control policy, i.e., it allows the controller to learn stabilizing the pole in the largest domain of initial conditions compared to the results obtained when using a single learning mechanism. This model can also find a successful control policy for goal-directed behavior, i.e., the robot can effectively learn to approach a given goal compared to its individual components. Thus, the study pursued here sharpens our understanding of how two different learning mechanisms can be combined and complement each other for solving complex tasks.

Keywords: Classical conditioning; operant conditioning; associative learning; reinforcement learning; pole balancing; goal-directed behavior (search for similar items in EconPapers)
Date: 2013
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
http://www.worldscientific.com/doi/abs/10.1142/S021952591350015X
Access to full text is restricted to subscribers

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:wsi:acsxxx:v:16:y:2013:i:02n03:n:s021952591350015x

Ordering information: This journal article can be ordered from

DOI: 10.1142/S021952591350015X

Access Statistics for this article

Advances in Complex Systems (ACS) is currently edited by Frank Schweitzer

More articles in Advances in Complex Systems (ACS) from World Scientific Publishing Co. Pte. Ltd.
Bibliographic data for series maintained by Tai Tone Lim ().