Mastering Atari, Go, chess and shogi by planning with a learned model

Schrittwieser, Julian; Antonoglou, Ioannis; Hubert, Thomas; Simonyan, Karen; Sifre, Laurent; Schmitt, Simon; Guez, Arthur; Lockhart, Edward; Hassabis, Demis; Graepel, Thore; Lillicrap, Timothy; Silver, David

Mastering Atari, Go, chess and shogi by planning with a learned model

Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap and David Silver ()
Additional contact information
Julian Schrittwieser: DeepMind
Ioannis Antonoglou: DeepMind
Thomas Hubert: DeepMind
Karen Simonyan: DeepMind
Laurent Sifre: DeepMind
Simon Schmitt: DeepMind
Arthur Guez: DeepMind
Edward Lockhart: DeepMind
Demis Hassabis: DeepMind
Thore Graepel: DeepMind
Timothy Lillicrap: DeepMind
David Silver: DeepMind

Nature, 2020, vol. 588, issue 7839, 604-609

Abstract: Abstract Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess1 and Go2, where a perfect simulator is available. However, in real-world problems, the dynamics governing the environment are often complex and unknown. Here we present the MuZero algorithm, which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics. The MuZero algorithm learns an iterable model that produces predictions relevant to planning: the action-selection policy, the value function and the reward. When evaluated on 57 different Atari games3—the canonical video game environment for testing artificial intelligence techniques, in which model-based planning approaches have historically struggled4—the MuZero algorithm achieved state-of-the-art performance. When evaluated on Go, chess and shogi—canonical environments for high-performance planning—the MuZero algorithm matched, without any knowledge of the game dynamics, the superhuman performance of the AlphaZero algorithm5 that was supplied with the rules of the game.

Date: 2020
References: Add references at CitEc
Citations: View citations in EconPapers (25)

Downloads: (external link)
https://www.nature.com/articles/s41586-020-03051-4 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:nature:v:588:y:2020:i:7839:d:10.1038_s41586-020-03051-4

Ordering information: This journal article can be ordered from
https://www.nature.com/

DOI: 10.1038/s41586-020-03051-4

Access Statistics for this article

Nature is currently edited by Magdalena Skipper

More articles in Nature from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().