Reinforcement Learning Fast and Slow: Goal-Directed and Memory Retrieval Mechanism!

Subscribers:
5,330
Published on ● Video Link: https://www.youtube.com/watch?v=M10f3ihj3cE



Duration: 1:54:43
368 views
12


Model-based next state prediction and state value prediction are slow to converge. To address these challenges, we do the following: i) Instead of a neural network, we do model-based planning using a parallel memory retrieval system (which we term the slow mechanism); ii) Instead of learning state values, we guide the agent's actions using goal-directed exploration, by using a neural network to choose the next action given the current state and the goal state (which we term the fast mechanism). The goal-directed exploration can be trained online using hippocampal replay of visited states and future imagined states every single time step, leading to fast and efficient training! Every single visited state can be a start state and a goal state, maximising the value of every single experience!

The slow mechanism (memory retrieval) is slow to reference but fast to adapt to environmental changes, while the fast mechanism (goal-directed neural network) is fast for inference but takes a longer time to adjust to an environmental change. Both mechanisms are crucial for functioning, and surpass reward-based mechanisms in navigating a 10x10 grid world.

Humans typically do not act randomly, but act in the pursuit of goals. I posit that the future of RL will be to model these goals and sub-goals, and plan it out in a goal-directed memory-based approach!

~~~~~~~~
This is Part 2 of "A New Framework of Memory for Learning".
See Part 1 here: https://www.youtube.com/watch?v=q9uMEAcB3lM

Paper can be found at: https://arxiv.org/abs/2301.13758
Slides can be found at: https://github.com/tanchongmin/TensorFlow-Implementations/blob/main/Paper_Reviews/A%20New%20Framework%20of%20Memory%20for%20Learning.pdf
Updated (summarized) slides can be found at: https://github.com/tanchongmin/TensorFlow-Implementations/blob/main/Paper_Reviews/A%20New%20Framework%20of%20Memory%20for%20Learning%20(Summary).pptx

See updated idea here: https://www.youtube.com/watch?v=Hr9zW7Usb7I
See previous idea on Hippocampal Replay (Neurips memARI workshop 2022) here: https://www.youtube.com/watch?v=SG02XgfzxEg

0:00 Motivation
1:40 Typical Reinforcement Learning is Slow
9:24 Hippocampal Replay and Neurips 2022 memARI workshop paper
12:47 Markov Decision Process is computationally expensive to model
16:03 A New RL Paradigm
38:32 Do humans think by value functions, or by memory?
41:56 How to make RL more like SL
47:07 Goal-Directed Exploration
51:38 Two Networks - Fast and Slow
53:50 Memory retrieval mechanism (Slow)
58:57 Overall Procedure using Memory
1:04:31 Goal-Directed Neural Network Update (Fast)
1:12:46 Experimental Validation (Static)
1:14:44 Experimental Validation (Random Start, Random Goal States, Changing Environment)
1:18:36 Live Code Walkthrough
1:37:46 Reward (or Goal-Directed) and Pain Pathways
1:40:23 Discussion

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.

Discord: https://discord.gg/fXCZCPYs
LinkedIn: https://www.linkedin.com/in/chong-min-tan-94652288/
Online AI blog: https://delvingintotech.wordpress.com/.
Twitter: https://twitter.com/johntanchongmin
Try out my games here: https://simmer.io/@chongmin




Other Videos By John Tan Chong Min


2023-03-07Using Transformers to mimic anyone's voice! - VALL-E (Part 1)
2023-02-28Learning Part-Whole Structure by Chunking - More Efficient than Deep Learning!!!
2023-02-21High-level planning with large language models - SayCan
2023-02-13Learning, Fast and Slow: Towards Fast and Adaptable Agents in Changing Environments
2023-02-07Using Logic Gates as Neurons - Deep Differentiable Logic Gate Networks!
2023-01-31Learn from External Memory, not just Weights: Large-Scale Retrieval for Reinforcement Learning
2023-01-17How ChatGPT works - From Transformers to Reinforcement Learning with Human Feedback (RLHF)
2023-01-09HyperTree Proof Search - Automated Theorem Proving with AlphaZero and Transformers!
2022-12-23CodinGame Fall Challenge 2022: A First Look (managed to get to Silver!)
2022-12-21Can ChatGPT solve CodinGame/Google Kickstart problems?
2022-12-19Reinforcement Learning Fast and Slow: Goal-Directed and Memory Retrieval Mechanism!
2022-12-12A New Framework of Memory for Learning (Part 1)
2022-11-14Hippocampal Replay for Learning (Full Length with Questions)
2022-11-14Hippocampal Replay for Learning (3 min summary)
2022-11-07AlphaTensor: Using Reinforcement Learning for Efficient Matrix Multiplication
2022-10-27Playing Go on TyGem and learning from AI (~ 3 kyu)
2022-10-13Heroes of Might and Magic III - Armageddon's Blade Campaign (First Playthrough) - Final!!!
2022-10-13Heroes of Might and Magic III - Armageddon's Blade Campaign (First Playthrough) - Part 6
2022-10-11Playing Go on Tygem + AI Analysis (~4 kyu)
2022-10-11Heroes of Might and Magic III - Armageddon's Blade Campaign (First Playthrough) - Part 5
2022-10-11Heroes of Might and Magic III - Armageddon's Blade Campaign (First Playthrough) - Part 4