Dr. Richard Sutton

ACM A. M. Turing Award (2024)
2024 ACM A.M. Turing Award

ACM A. M. Turing Award

Canada - 2024

citation

For developing the conceptual and algorithmic foundations of reinforcement learning

Andrew Barto and Richard Sutton are the recipients of the 2024 ACM A.M. Turing Award. In a series of papers beginning in the 1980s, Barto and Sutton introduced the main ideas, constructed the mathematical foundations, and developed important algorithms for reinforcement learning, one of the most important approaches for creating intelligent systems. Barto is Professor Emeritus of Computer Science at the University of Massachusetts, Amherst; Sutton is Professor of Computing Science at the University of Alberta and Research Scientist at Keen Technologies.

The field of artificial intelligence (AI) is generally concerned with constructing agents-that is, entities that perceive and act. More intelligent agents are those that choose better courses of action. Thus, the notion that some courses of action are better than others is central to AI.

Reward-a term borrowed from psychology and neuroscience-denotes a signal provided to an agent related to the quality of its behavior; and reinforcement learning (RL) is the process of learning to behave more successfully given this signal.

The idea of learning from reward has been familiar to animal trainers for thousands of years. Alan Turing himself, in his 1950 paper "Computing Machinery and Intelligence," proposed an approach to machine learning based on "rewards and punishments" and reported having conducted some initial experiments with this approach. Arthur Samuel's self-learning checker-playing program, demonstrated on television in 1956, was perhaps the first successful example of reinforcement learning-although it lacked any form of justification as to why or whether it would work.

Within the field of AI, little further progress occurred in this vein until the early 1980s, when Barto and his Ph.D. student Sutton, motivated by observations from psychology, began to formulate reinforcement learning as a general problem framework. They drew on the mathematical foundation provided by Markov decision processes (MDPs), wherein an agent makes decisions in a stochastic environment, receiving a reward signal after each transition and aiming to maximize its long-term cumulative reward. Whereas standard MDP theory assumes that everything about the MDP is known to the agent, the RL framework allows for the environment and the rewards to be unknown. The minimal information requirements of RL, combined with the generality of the MDP framework, allows RL algorithms to be applied to a vast range of problems, as explained further below.

Barto and Sutton, jointly and with other authors, developed many of the basic algorithmic approaches for RL, including temporal difference learning, policy-gradient methods, and the use of neural networks as a tool to represent learned functions. They also proposed agent designs that combined learning and planning, demonstrating the value of acquiring knowledge of the environment as a basis for planning. Perhaps equally important was the textbook, Reinforcement Learning: An Introduction (1998), which is the standard reference in the field and has been cited over 75,000 times. It allowed thousands of researchers to understand and contribute to this emerging field. As a result, RL is among the most active research areas in computer science today.

The most prominent example of RL in recent years was the victory by AlphaGo over the best human Go players in 2016 and 2017; but RL has achieved success in many areas including robot motor skill learning, network congestion control, chip design, internet advertising, optimization, global supply chain optimization, improving the behavior and reasoning capabilities of chatbots, and even improving algorithms for one of the oldest problems in computer science, matrix multiplication. Finally, a technology that was partly inspired by neuroscience has returned the favor: recent research, including work by Barto, has shown that specific RL algorithms developed in AI provide the best explanations for a wide range of findings concerning the dopamine system in the brain.

Press Release

2024 ACM A.M. Turing Award

ACM has named Andrew G. Barto and Richard S. Sutton as the recipients of the 2024 ACM A.M. Turing Award for developing the conceptual and algorithmic foundations of reinforcement learning. In a series of papers beginning in the 1980s, Barto and Sutton introduced the main ideas, constructed the mathematical foundations, and developed important algorithms for reinforcement learning—one of the most important approaches for creating intelligent systems.

Barto is Professor Emeritus of Information and Computer Sciences at the University of Massachusetts, Amherst. Sutton is a Professor of Computer Science at the University of Alberta, a Research Scientist at Keen Technologies, and a Fellow at Amii (Alberta Machine Intelligence Institute).

The ACM A.M. Turing Award, often referred to as the “Nobel Prize in Computing,” carries a $1 million prize with financial support provided by Google, Inc. The award is named for Alan M. Turing, the British mathematician who articulated the mathematical foundations of computing.

What is Reinforcement Learning?

The field of artificial intelligence (AI) is generally concerned with constructing agents—that is, entities that perceive and act. More intelligent agents are those that choose better courses of action. Therefore, the notion that some courses of action are better than others is central to AI. Reward—a term borrowed from psychology and neuroscience—denotes a signal provided to an agent related to the quality of its behavior. Reinforcement learning (RL) is the process of learning to behave more successfully given this signal.

The idea of learning from reward has been familiar to animal trainers for thousands of years. Later, Alan Turing’s 1950 paper “Computing Machinery and Intelligence,” addressed the question “Can machines think?” and proposed an approach to machine learning based on rewards and punishments.

While Turing reported having conducted some initial experiments with this approach and Arthur Samuel developed a checker-playing program in the late 1950s that learned from self-play, little further progress occurred in this vein of AI in the following decades. In the early 1980s, motivated by observations from psychology, Barto and his PhD student Sutton began to formulate reinforcement learning as a general problem framework.

They drew on the mathematical foundation provided by Markov decision processes (MDPs), wherein an agent makes decisions in a stochastic (randomly determined) environment, receiving a reward signal after each transition and aiming to maximize its long-term cumulative reward. Whereas standard MDP theory assumes that everything about the MDP is known to the agent, the RL framework allows for the environment and the rewards to be unknown. The minimal information requirements of RL, combined with the generality of the MDP framework, allows RL algorithms to be applied to a vast range of problems, as explained further below.

Barto and Sutton, jointly and with others, developed many of the basic algorithmic approaches for RL. These include their foremost contribution, temporal difference learning, which made an important advance in solving reward prediction problems, as well as policy-gradient methods and the use of neural networks as a tool to represent learned functions. They also proposed agent designs that combined learning and planning, demonstrating the value of acquiring knowledge of the environment as a basis for planning.

Perhaps equally influential was their textbook, Reinforcement Learning: An Introduction (1998), which is still the standard reference in the field and has been cited over 75,000 times. It allowed thousands of researchers to understand and contribute to this emerging field and continues to inspire much significant research activity in computer science today.

Although Barto and Sutton’s algorithms were developed decades ago, major advances in the practical applications of RL came about in the past fifteen years by merging RL with deep learning algorithms (pioneered by 2018 Turing Awardees Bengio, Hinton, and LeCun). This led to the technique of deep reinforcement learning.

The most prominent example of RL was the victory by the AlphaGo computer program over the best human Go players in 2016 and 2017. Another major achievement recently has been the development of the chatbot ChatGPT. ChatGPT is a large language model (LLM) trained in two phases, the second of which employs a technique called reinforcement learning from human feedback (RLHF), to capture human expectations.

RL has achieved success in many other areas as well. A high-profile research example is robot motor skill learning in the in-hand robotic manipulation and solution of a physical (Rubik’s Cube), which showed it possible to do all the reinforcement learning in simulation yet ultimately be successful in the significantly different real world.

Other areas include network congestion control, chip design, internet advertising, optimization, global supply chain optimization, improving the behavior and reasoning capabilities of chatbots, and even improving algorithms for one of the oldest problems in computer science, matrix multiplication.

Finally, a technology that was partly inspired by neuroscience has returned the favor. Recent research, including work by Barto, has shown that specific RL algorithms developed in AI provide the best explanations for a wide range of findings concerning the dopamine system in the human brain.

Biographical BackGround

Andrew Barto is Professor Emeritus, Department of Information and Computer Sciences, University of Massachusetts, Amherst. He began his career at UMass Amherst as a postdoctoral Research Associate in 1977, and has subsequently held various positions including Associate Professor, Professor, and Department Chair. Barto received a BS degree in Mathematics (with distinction) from the University of Michigan, where he also earned his MS and PhD degrees in Computer and Communication Sciences.

Barto’s honors include the UMass Neurosciences Lifetime Achievement Award, the IJCAI Award for Research Excellence, and the IEEE Neural Network Society Pioneer Award. He is a Fellow of the Institute of Electrical and Electronics Engineers (IEEE), and a Fellow of the American Association for the Advancement of Science (AAAS).

Richard Sutton is a Professor in Computing Science at the University of Alberta, a Research Scientist at Keen Technologies (an artificial general intelligence company based in Dallas, Texas) and Chief Scientific Advisor of the Alberta Machine Intelligence Institute (Amii). Sutton was a Distinguished Research Scientist at Deep Mind from 2017 to 2023. Prior to joining the University of Alberta, he served as a Principal Technical Staff Member in the Artificial Intelligence Department at the AT&T Shannon Laboratory in Florham Park, New Jersey, from 1998 to 2002. Sutton’s collaborations with Andrew Barto began in 1978 at the University of Massachusetts at Amherst, where Barto was Sutton’s PhD and postdoctoral advisor. Sutton received his BA in Psychology from Stanford University and earned his MS and PhD degrees in Computer and Information Science from the University of Massachusetts at Amherst.

Sutton’s honors include receiving the IJCAI Research Excellence Award, a Lifetime Achievement Award from the Canadian Artificial Intelligence Association, and an Outstanding Achievement in Research Award from the University of Massachusetts at Amherst. Sutton is a Fellow of the Royal Society of London, a Fellow of the Association for the Advancement of Artificial Intelligence, and a Fellow of the Royal Society of Canada.