Directed Exploration in Black-Box Optimization for Multi-Objective Reinforcement Learning

García, Javier; Iglesias, Roberto; Rodríguez, Miguel A.; Regueiro, Carlos V.

Directed Exploration in Black-Box Optimization for Multi-Objective Reinforcement Learning

Javier García, Roberto Iglesias, Miguel A. Rodríguez and Carlos V. Regueiro
Additional contact information
Javier García: CiTIUS, Universidade de Santiago de Compostela, Santiago de Compostela, Spain
Roberto Iglesias: CiTIUS, Universidade de Santiago de Compostela, Santiago de Compostela, Spain
Miguel A. Rodríguez: CiTIUS, Universidade de Santiago de Compostela, Santiago de Compostela, Spain
Carlos V. Regueiro: Department of Electronics and Systems, Universidade de Coruña, A Coruña, Spain

International Journal of Information Technology & Decision Making (IJITDM), 2019, vol. 18, issue 03, 1045-1082

Abstract: Usually, real-world problems involve the optimization of multiple, possibly conflicting, objectives. These problems may be addressed by Multi-objective Reinforcement learning (MORL) techniques. MORL is a generalization of standard Reinforcement Learning (RL) where the single reward signal is extended to multiple signals, in particular, one for each objective. MORL is the process of learning policies that optimize multiple objectives simultaneously. In these problems, the use of directional/gradient information can be useful to guide the exploration to better and better behaviors. However, traditional policy-gradient approaches have two main drawbacks: they require the use of a batch of episodes to properly estimate the gradient information (reducing in this way the learning speed), and they use stochastic policies which could have a disastrous impact on the safety of the learning system. In this paper, we present a novel population-based MORL algorithm for problems in which the underlying objectives are reasonably smooth. It presents two main characteristics: fast computation of the gradient information for each objective through the use of neighboring solutions, and the use of this information to carry out a geometric partition of the search space and thus direct the exploration to promising areas. Finally, the algorithm is evaluated and compared to policy gradient MORL algorithms on different multi-objective problems: the water reservoir and the biped walking problem (the latter both on simulation and on a real robot).

Keywords: Reinforcement learning; multi-objective optimization; robotic tasks; policy search; black-box optimization (search for similar items in EconPapers)
Date: 2019
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
http://www.worldscientific.com/doi/abs/10.1142/S0219622019500093
Access to full text is restricted to subscribers

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:wsi:ijitdm:v:18:y:2019:i:03:n:s0219622019500093

Ordering information: This journal article can be ordered from

DOI: 10.1142/S0219622019500093

Access Statistics for this article

International Journal of Information Technology & Decision Making (IJITDM) is currently edited by Yong Shi

More articles in International Journal of Information Technology & Decision Making (IJITDM) from World Scientific Publishing Co. Pte. Ltd.
Bibliographic data for series maintained by Tai Tone Lim ().