Analysis of Hyper-Parameters for AlphaZero-Like Deep Reinforcement Learning

Wang, Hui; Emmerich, Michael; Preuss, Mike; Plaat, Aske

Analysis of Hyper-Parameters for AlphaZero-Like Deep Reinforcement Learning

Hui Wang, Michael Emmerich, Mike Preuss and Aske Plaat
Additional contact information
Hui Wang: Universiteit Leiden, Leiden Institute of Advanced Computer Science, Leiden, Netherlands
Michael Emmerich: Universiteit Leiden, Leiden Institute of Advanced Computer Science, Leiden, Netherlands
Mike Preuss: Universiteit Leiden, Leiden Institute of Advanced Computer Science, Leiden, Netherlands
Aske Plaat: Universiteit Leiden, Leiden Institute of Advanced Computer Science, Leiden, Netherlands

International Journal of Information Technology & Decision Making (IJITDM), 2023, vol. 22, issue 02, 829-853

Abstract: The landmark achievements of AlphaGo Zero have created great research interest into self-play in reinforcement learning. In self-play, Monte Carlo Tree Search (MCTS) is used to train a deep neural network, which is then used itself in tree searches. The training is governed by many hyper-parameters. There has been surprisingly little research on design choices for hyper-parameter values and loss functions, presumably because of the prohibitive computational cost to explore the parameter space. In this paper, we investigate 12 hyper-parameters in an AlphaZero-like self-play algorithm and evaluate how these parameters contribute to training. Through multi-objective analysis, we identify four important hyper-parameters to further assess. To start, we find surprising results where too much training can sometimes lead to lower performance. Our main result is that the number of self-play iterations subsumes MCTS-search simulations, game episodes and training epochs. As a consequence of our experiments, we provide recommendations on setting hyper-parameter values in self-play. The outer loop of self-play iterations should be emphasized, in favor of the inner loop. This means hyper-parameters for the inner loop, should be set to lower values. A secondary result of our experiments concerns the choice of optimization goals, for which we also provide recommendations.

Keywords: AlphaZero; parameter sweep; parameter evaluation; loss function (search for similar items in EconPapers)
Date: 2023
References: Add references at CitEc
Citations:

Downloads: (external link)
http://www.worldscientific.com/doi/abs/10.1142/S0219622022500547
Access to full text is restricted to subscribers

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:wsi:ijitdm:v:22:y:2023:i:02:n:s0219622022500547

Ordering information: This journal article can be ordered from

DOI: 10.1142/S0219622022500547

Access Statistics for this article

International Journal of Information Technology & Decision Making (IJITDM) is currently edited by Yong Shi

More articles in International Journal of Information Technology & Decision Making (IJITDM) from World Scientific Publishing Co. Pte. Ltd.
Bibliographic data for series maintained by Tai Tone Lim ().