Reinforcement Learning Fast and Slow: Goal-Directed and Memory Retrieval Mechanism!
Model-based next state prediction and state value prediction are slow to converge. To address these challenges, we do the following: i) Instead of a neural network, we do model-based planning using a parallel memory retrieval system (which we term the slow mechanism); ii) Instead of learning state values, we guide the agent's actions using goal-directed exploration, by using a neural network to choose the next action given the current state and the goal state (which we term the fast mechanism). The goal-directed exploration can be trained online using hippocampal replay of visited states and future imagined states every single time step, leading to fast and efficient training! Every single visited state can be a start state and a goal state, maximising the value of every single experience!
The slow mechanism (memory retrieval) is slow to reference but fast to adapt to environmental changes, while the fast mechanism (goal-directed neural network) is fast for inference but takes a longer time to adjust to an environmental change. Both mechanisms are crucial for functioning, and surpass reward-based mechanisms in navigating a 10x10 grid world.
Humans typically do not act randomly, but act in the pursuit of goals. I posit that the future of RL will be to model these goals and sub-goals, and plan it out in a goal-directed memory-based approach!
~~~~~~~~
This is Part 2 of "A New Framework of Memory for Learning".
See Part 1 here: https://www.youtube.com/watch?v=q9uMEAcB3lM
Paper can be found at: https://arxiv.org/abs/2301.13758
Slides can be found at: https://github.com/tanchongmin/TensorFlow-Implementations/blob/main/Paper_Reviews/A%20New%20Framework%20of%20Memory%20for%20Learning.pdf
Updated (summarized) slides can be found at: https://github.com/tanchongmin/TensorFlow-Implementations/blob/main/Paper_Reviews/A%20New%20Framework%20of%20Memory%20for%20Learning%20(Summary).pptx
See updated idea here: https://www.youtube.com/watch?v=Hr9zW7Usb7I
See previous idea on Hippocampal Replay (Neurips memARI workshop 2022) here: https://www.youtube.com/watch?v=SG02XgfzxEg
0:00 Motivation
1:40 Typical Reinforcement Learning is Slow
9:24 Hippocampal Replay and Neurips 2022 memARI workshop paper
12:47 Markov Decision Process is computationally expensive to model
16:03 A New RL Paradigm
38:32 Do humans think by value functions, or by memory?
41:56 How to make RL more like SL
47:07 Goal-Directed Exploration
51:38 Two Networks - Fast and Slow
53:50 Memory retrieval mechanism (Slow)
58:57 Overall Procedure using Memory
1:04:31 Goal-Directed Neural Network Update (Fast)
1:12:46 Experimental Validation (Static)
1:14:44 Experimental Validation (Random Start, Random Goal States, Changing Environment)
1:18:36 Live Code Walkthrough
1:37:46 Reward (or Goal-Directed) and Pain Pathways
1:40:23 Discussion
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.
Discord: https://discord.gg/fXCZCPYs
LinkedIn: https://www.linkedin.com/in/chong-min-tan-94652288/
Online AI blog: https://delvingintotech.wordpress.com/.
Twitter: https://twitter.com/johntanchongmin
Try out my games here: https://simmer.io/@chongmin