Mastering the game of Go with deep neural networks and tree search

Silver, David; Huang, Aja; Maddison, Chris J.; Guez, Arthur; Sifre, Laurent; van den Driessche, George; Schrittwieser, Julian; Antonoglou, Ioannis; Panneershelvam, Veda; Lanctot, Marc; Dieleman, Sander; Grewe, Dominik; Nham, John; Kalchbrenner, Nal; Sutskever, Ilya; Lillicrap, Timothy; Leach, Madeleine; Kavukcuoglu, Koray; Graepel, Thore; Hassabis, Demis

Mastering the game of Go with deep neural networks and tree search

David Silver (), Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel and Demis Hassabis ()
Additional contact information
David Silver: Google DeepMind
Aja Huang: Google DeepMind
Chris J. Maddison: Google DeepMind
Arthur Guez: Google DeepMind
Laurent Sifre: Google DeepMind
George van den Driessche: Google DeepMind
Julian Schrittwieser: Google DeepMind
Ioannis Antonoglou: Google DeepMind
Veda Panneershelvam: Google DeepMind
Marc Lanctot: Google DeepMind
Sander Dieleman: Google DeepMind
Dominik Grewe: Google DeepMind
John Nham: Google, 1600 Amphitheatre Parkway, Mountain View
Nal Kalchbrenner: Google DeepMind
Ilya Sutskever: Google, 1600 Amphitheatre Parkway, Mountain View
Timothy Lillicrap: Google DeepMind
Madeleine Leach: Google DeepMind
Koray Kavukcuoglu: Google DeepMind
Thore Graepel: Google DeepMind
Demis Hassabis: Google DeepMind

Nature, 2016, vol. 529, issue 7587, 484-489

Abstract: Abstract The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Here we introduce a new approach to computer Go that uses ‘value networks’ to evaluate board positions and ‘policy networks’ to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of state-of-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play. We also introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.

Date: 2016
References: Add references at CitEc
Citations: View citations in EconPapers (293)

Downloads: (external link)
https://www.nature.com/articles/nature16961 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:nature:v:529:y:2016:i:7587:d:10.1038_nature16961

Ordering information: This journal article can be ordered from
https://www.nature.com/

DOI: 10.1038/nature16961

Access Statistics for this article

Nature is currently edited by Magdalena Skipper

More articles in Nature from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().