United Kingdom - 2019
For breakthrough advances in computer game-playing
The goal of designing algorithms that can win challenging games against human opponents has defined multiple grand challenges in AI. Algorithmic successes, such as playing checkers in the 1960s and chess in the 1990s, relied largely on brute-force search, coupled with heuristic evaluation functions. Other games, such as Go, proved far more difficult due to their larger branching factor. At the ACM Turing Centenary Celebration in 2012, in an informal straw poll, the majority of participants estimated a breakthrough in Go to be at least 20 years away. Yet, it took only four years for an algorithm called AlphaGo, developed by David Silver and colleagues, to defeat Go world champion Lee Sedol.
David Silver is a pioneer in the rising and important area of deep reinforcement learning. In developing AlphaGo, Silver built on research that he had initiated years earlier during his doctoral work at the University of Alberta. By deftly combining ideas from deep learning, reinforcement learning, traditional tree-search, and large-scale computing, he and his team at DeepMind produced a breakthrough result that astonished the scientific world. The winning system employs an architecture that couples a form of probabilistic search, called Monte Carlo tree search, with two deep neural networks to guide the search: a policy network that predicts the next move most likely to lead to a win, and a value network that limits the depth of search by learning to evaluate positions reached.
AlphaGo was initialized by training on expert human games followed by reinforcement learning to improve its performance. Subsequently, Silver sought even more principled methods for achieving greater performance and generality. He developed the Alpha Zero algorithm that learned entirely by playing games against itself, starting without any human data or prior knowledge except the game rules. AlphaZero achieved superhuman performance in the games of Chess, Shogi, and Go, demonstrating unprecedented generality of the game-playing methods. In Chess, Alpha Zero categorically defeated the world computer chess champion Stockfish, a high-performance program based on decades of specialized knowledge, handcrafted by grandmasters and chess programming experts. Silver next led DeepMind's project to play the game Starcraft II, a radically different but also stunningly hard challenge for learning systems because of the mix of temporal and spatial scales and partial observability. Once again, David Silver had humans on the ropes by solving critically hard problems of learning and decision-making that few others have dared to attempt.