Reinforcement Learning | A history from Tic-Tac-Toe to Humanoids

Subscribers:
175,000
Published on ● Video Link: https://www.youtube.com/watch?v=q59j6ExuL7Y



Duration: 0:00
6,035 views
430


How did AI systems learn to act & "feel"? I follow the history of Reinforcement Learning and the development of Value, Q, Policy functions & TD Learning. Starting with learning tic tac toe, checkers, backgammon, as well as physical problems (cart and pole), walking, grasping (OpenAI's dexterous robotic hand). I found the history a bit of a mess so i tried to clean it up. Open AI o1

Thanks to Jane Street for sponsoring this video. They are hiring people interested in ML! learn more about their work and open roles (and support me), visit their website: jane-st.co/ml

I also follow the process of transferring simulated skills to the real world (domain randomization) and witness the emergence of human-like behaviors in AI agents. It leaves us with a provocative question: where is the line between actions and words? What is the role of an GPT for actions?

Featuring insights from:
Claude Shannon
Arthur Samuel
Gerald Tesauro
Richard Sutton
David Silver
Deep Mind/Open AI etc.

00:00 - Introduction
00:32 - Learning Tic Tac Toe
02:00 - Learning Cart and pole
04:20 - Shannon & Chess
06:50 - Samuel's Checkers
09:25 - TD Gammon (Gerald Tesaruo)
11:00 - TD Learning
14:30 - Learning Atari (DQN)
17:28 - DIrect Policy Gradiant
19:40 - Domain Randomization