Gerald Tesauro

Gerald J. "Gerry" Tesauro is an American computer scientist and a researcher at IBM, known for his development of TD-Gammon, a backgammon program that taught itself to play at a world-championship level through self-play and temporal difference learning, an early success in reinforcement learning and neural networks. He subsequently researched on autonomic computing, multi-agent systems for e-commerce, and contributed to the game strategy algorithms for IBM Watson.

Quick Facts Nationality, Alma mater ...

Gerald Tesauro
Nationality	American
Alma mater	University of Maryland, College Park (B.S. Physics) Princeton University (Ph.D. Physics, 1986)
Known for	TD-Gammon, IBM Watson
Awards	Hertz Foundation Fellow (1980) Fellow of the AAAI (2013) Fellow of the ACM (2018)
Scientific career
Fields	Artificial neural network, Reinforcement learning, Autonomic computing
Institutions	IBM Research, University of Illinois Urbana-Champaign (postdoc)
Thesis	Steady-State Dynamics and Selection Principles in Nonequilibrium Pattern-Forming Systems (1986)
Doctoral advisor	Philip W. Anderson, Michael C. Cross

Education

Tesauro earned a B.S. in physics from the University of Maryland, College Park. He then pursued graduate studies in plasma physics at Princeton University, supported by a Hertz Foundation Fellowship starting in 1980.^[1] He completed his Ph.D. in theoretical physics in 1986 under the supervision of Nobel laureate Philip W. Anderson.^[2]

Backgammon

After completing his Ph.D., he undertook postdoctoral research at the Center for Complex Systems Research, University of Illinois at Urbana-Champaign.^[3]^[4] During this period, he began applying neural networks to games, co-authoring a NeurIPS paper in 1987 with Terrence Sejnowski on a neural network that learned to play backgammon.^[5] By the late 1980s, Tesauro joined IBM's Thomas J. Watson Research Center (IBM Research) as a research scientist, where he would spend several decades, eventually rising to the position of Principal Research Staff Member in AI Science.^[1]

During late 1980s, he developed Neurogammon, a backgammon program trained on expert human games using supervised learning. Neurogammon won the backgammon tournament at the 1st Computer Olympiad in 1989, demonstrating the potential of neural networks in game AI.^[3]

He developed TD-Gammon during the 1990 to 1998 period, using reinforcement learning, specifically temporal-difference (TD) learning. TD-Gammon learned through self-play, using a neural network to evaluate board positions and improving its strategy over millions of games. The program achieved world-championship-level play, capable of challenging top human players.^[6] It is often regarded as an early success of neural networks, machine learning, and RL, and often cited as a precursor in publications on later game-playing systems, such as AlphaZero.^[7]

During this period, Tesauro also contributed to computer chess research at IBM, exploring machine learning methods for training evaluation functions, although the main Deep Blue project was led by others. Specifically, some linear evaluation function weights were trained by discretized comparison training.^[8] The weights primarily evaluated king safety.^[9] Since 2010, he also contributed to computer Go by working on a program called Fuego.^[3]

E-commerce

In the late 1990s, Tesauro shifted his focus towards multi-agent systems and their application in e-commerce, such as autonomous "pricebots", which are software agents designed to learn optimal pricing and bidding strategies in electronic marketplaces.^[10] Methods included Q-learning for dynamic pricing strategies (e.g., cooperation or undercutting) in competitive environments.^[11]^[12] It was an early application of multi-agent reinforcement learning to economic modeling and automated trading. He also explored applying neural networks to computer virus detection.^[13]

Autonomic computing

From the early 2000s, Tesauro became a key contributor to IBM's autonomic computing initiative, which aimed to create self-managing IT systems. He applied reinforcement learning to automate tasks like resource allocation, performance tuning, and power management in data centers and distributed systems. Examples include multiple cooperating RL agents that learned to optimize server resources (CPU, memory, power) to meet performance goals or minimize energy consumption.^[14]^[15]^[16]^[17]

Tesauro is listed as an inventor on numerous U.S. patents, largely focused on autonomic computing and AI applications for systems management, filed primarily between 2004 and 2007. These usually included methods for reward-based learning of system policies, utility-based dynamic resource allocation, and autonomic model transfer in computing systems.^[18]

IBM Watson

Around 2009, Tesauro joined the IBM Research team, led by David Ferrucci,^[3] that developed IBM Watson, the question-answering system famous for defeating human champions Ken Jennings and Brad Rutter on the quiz show Jeopardy! in 2011.

Tesauro focused on Watson's game strategy components, including algorithms for buzzer timing, clue selection, and wagering decisions (especially for Daily Doubles and Final Jeopardy!). He and colleagues developed a Game State Evaluator and used simulation-based optimization, employing techniques from Bayesian inference, game theory, dynamic programming, and reinforcement learning to refine Watson's strategic play. These strategic algorithms contributed significantly to Watson's success, enabling it to manage risk effectively and make near-optimal wagering decisions.^[19]^[20]^[21]

During this time, Tesauro also continued research in core AI algorithms, co-authoring a paper on Monte Carlo Simulation Balancing with David Silver (later of DeepMind) at ICML 2009.^[22] After Watson, Tesauro continued research at IBM, on areas such as deep reinforcement learning,^[23] hierarchical RL, multi-agent systems,^[24] and continual learning.^[25]

Career

Education

Backgammon

E-commerce

Autonomic computing

IBM Watson

Honors and awards

References

External links

Wikiwand - on