Reinforcement Learning Fast and Slow: Goal-Directed and Memory Retrieval Mechanism!

Channel:

John Tan Chong Min

Subscribers:

5,450

Published on December 19, 2022 2:40:50 PM ● Video Link: https://www.youtube.com/watch?v=M10f3ihj3cE

Duration: 1:54:43

368 views

Model-based next state prediction and state value prediction are slow to converge. To address these challenges, we do the following: i) Instead of a neural network, we do model-based planning using a parallel memory retrieval system (which we term the slow mechanism); ii) Instead of learning state values, we guide the agent's actions using goal-directed exploration, by using a neural network to choose the next action given the current state and the goal state (which we term the fast mechanism). The goal-directed exploration can be trained online using hippocampal replay of visited states and future imagined states every single time step, leading to fast and efficient training! Every single visited state can be a start state and a goal state, maximising the value of every single experience!

The slow mechanism (memory retrieval) is slow to reference but fast to adapt to environmental changes, while the fast mechanism (goal-directed neural network) is fast for inference but takes a longer time to adjust to an environmental change. Both mechanisms are crucial for functioning, and surpass reward-based mechanisms in navigating a 10x10 grid world.

Humans typically do not act randomly, but act in the pursuit of goals. I posit that the future of RL will be to model these goals and sub-goals, and plan it out in a goal-directed memory-based approach!

~~~~~~~~
This is Part 2 of "A New Framework of Memory for Learning".
See Part 1 here: https://www.youtube.com/watch?v=q9uMEAcB3lM

Paper can be found at: https://arxiv.org/abs/2301.13758
Slides can be found at: https://github.com/tanchongmin/TensorFlow-Implementations/blob/main/Paper_Reviews/A%20New%20Framework%20of%20Memory%20for%20Learning.pdf
Updated (summarized) slides can be found at: https://github.com/tanchongmin/TensorFlow-Implementations/blob/main/Paper_Reviews/A%20New%20Framework%20of%20Memory%20for%20Learning%20(Summary).pptx

See updated idea here: https://www.youtube.com/watch?v=Hr9zW7Usb7I
See previous idea on Hippocampal Replay (Neurips memARI workshop 2022) here: https://www.youtube.com/watch?v=SG02XgfzxEg

0:00 Motivation
1:40 Typical Reinforcement Learning is Slow
9:24 Hippocampal Replay and Neurips 2022 memARI workshop paper
12:47 Markov Decision Process is computationally expensive to model
16:03 A New RL Paradigm
38:32 Do humans think by value functions, or by memory?
41:56 How to make RL more like SL
47:07 Goal-Directed Exploration
51:38 Two Networks - Fast and Slow
53:50 Memory retrieval mechanism (Slow)
58:57 Overall Procedure using Memory
1:04:31 Goal-Directed Neural Network Update (Fast)
1:12:46 Experimental Validation (Static)
1:14:44 Experimental Validation (Random Start, Random Goal States, Changing Environment)
1:18:36 Live Code Walkthrough
1:37:46 Reward (or Goal-Directed) and Pain Pathways
1:40:23 Discussion

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.

Discord: https://discord.gg/fXCZCPYs
LinkedIn: https://www.linkedin.com/in/chong-min-tan-94652288/
Online AI blog: https://delvingintotech.wordpress.com/.
Twitter: https://twitter.com/johntanchongmin
Try out my games here: https://simmer.io/@chongmin

Other Videos By John Tan Chong Min

2023-03-07	Using Transformers to mimic anyone's voice! - VALL-E (Part 1)
2023-02-28	Learning Part-Whole Structure by Chunking - More Efficient than Deep Learning!!!
2023-02-21	High-level planning with large language models - SayCan
2023-02-13	Learning, Fast and Slow: Towards Fast and Adaptable Agents in Changing Environments
2023-02-07	Using Logic Gates as Neurons - Deep Differentiable Logic Gate Networks!
2023-01-31	Learn from External Memory, not just Weights: Large-Scale Retrieval for Reinforcement Learning
2023-01-17	How ChatGPT works - From Transformers to Reinforcement Learning with Human Feedback (RLHF)
2023-01-09	HyperTree Proof Search - Automated Theorem Proving with AlphaZero and Transformers!
2022-12-23	CodinGame Fall Challenge 2022: A First Look (managed to get to Silver!)
2022-12-21	Can ChatGPT solve CodinGame/Google Kickstart problems?
2022-12-19	Reinforcement Learning Fast and Slow: Goal-Directed and Memory Retrieval Mechanism!
2022-12-12	A New Framework of Memory for Learning (Part 1)
2022-11-14	Hippocampal Replay for Learning (Full Length with Questions)
2022-11-14	Hippocampal Replay for Learning (3 min summary)
2022-11-07	AlphaTensor: Using Reinforcement Learning for Efficient Matrix Multiplication
2022-10-27	Playing Go on TyGem and learning from AI (~ 3 kyu)
2022-10-13	Heroes of Might and Magic III - Armageddon's Blade Campaign (First Playthrough) - Final!!!
2022-10-13	Heroes of Might and Magic III - Armageddon's Blade Campaign (First Playthrough) - Part 6
2022-10-11	Playing Go on Tygem + AI Analysis (~4 kyu)
2022-10-11	Heroes of Might and Magic III - Armageddon's Blade Campaign (First Playthrough) - Part 5
2022-10-11	Heroes of Might and Magic III - Armageddon's Blade Campaign (First Playthrough) - Part 4

Channel	Latest
Kaal Chamber	6 hours ago
GameTechPlanet	6 hours ago
ZdsPro	6 hours ago
CoverSolutions	6 hours ago
Mr H Reviews	6 hours ago
MecaWOWS Gameplays	6 hours ago
PolarisZenKai’s Amiibo Fights!	6 hours ago
Adam Savage’s Tested	6 hours ago
ProMuLLer	7 hours ago
0 Luderking Zueiro	7 hours ago
bookshelffury	7 hours ago
Abrix	7 hours ago
Clear Mind	7 hours ago
Tilak	7 hours ago
TattsForLife	7 hours ago
Jars	7 hours ago
Darkjoeyx12	7 hours ago
BORDERLANDS	7 hours ago
SirTophamHakurei	7 hours ago
Knightfall	7 hours ago
Brani	7 hours ago
gta2025	7 hours ago
FN Shop	7 hours ago
Super	7 hours ago
Salty Not Sweat	7 hours ago