Learning, Fast and Slow: Towards Fast and Adaptable Agents in Changing Environments
Excited to share my latest work titled "Learning, Fast and Slow".
The inspiration came from observing how humans are typically goal-directed and head straight towards a planned goal, as well as how we can retrieve past memory in order to perform a task consistently. Combining a fast neural network (System 1) prediction, with a slow parallel memory-retrieval system (System 2) to override the fast when able to, we can get efficient decision making.
I was also inspired by the hippocampal replay in mice, as there seems to be both replay of the past visited states, as well as replay of future imagined states. Hence, I posit that this replay could help in goal-directed learning and we train the fast neural network with this in mind.
Overall, the initial results are promising, as empirical studies show that our proposed method has a 92% solve rate across 100 episodes in a dynamically changing grid world, significantly outperforming state-of-the-art actor critic mechanisms such as PPO (54%), TRPO (50%) and A2C (24%).
Happy to discuss more on this, and take any feedback. There's a lot more things I have in mind to add to this system, more will be shared in my next framework paper to be released shortly.
Paper: https://arxiv.org/abs/2301.13758
Code: https://github.com/tanchongmin/Learning-Fast-and-Slow
Slides: https://github.com/tanchongmin/TensorFlow-Implementations/tree/main/Paper_Reviews
Earlier iteration of this idea: https://www.youtube.com/watch?v=M10f3ihj3cE
Related Papers:
SayCan (LLMs for Goal Setting): https://say-can.github.io/
Yann LeCun's Path Towards Autonomous Machine Intelligence (Hierarchical Planning): https://openreview.net/pdf?id=BZ5a1r-kVsf
~~~~~~~~~~~~~~~~~~~~~~~~~~
0:00 Introduction
2:03 Aim
4:37 Traditional Reinforcement Learning
6:36 Thought Experiment: How do we think?
9:52 Preliminaries: Modelling the world
29:49 Neural Networks vs Hashtable for Memory
35:45 Value-based RL is slow
40:04 Goal-Directed Exploration
45:28 Fast & Slow
56:41 Hippocampal Replay to train Fast Goal-Directed Neural Network
1:02:19 Massively Parallel Memory Retrieval Network
1:04:13 Overall Fast & Slow Procedure
1:10:13 Results (Static)
1:14:26 Results (Dynamic)
1:18:00 Results (Ablation)
1:22:42 Conclusion
1:24:10 Extensions
1:24:23 Multi-Agent Learning
1:26:08 Natural Language Processing for Goal and Subgoal Planning
1:29:16 Forgetting as Learning
1:31:37 Hierarchical Planning
1:35:14 Discussion
2:19:20 Concluding Remarks
~~~~~~~~~~~~~~~~~~~~~~~~~~
AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.
Discord: https://discord.gg/fXCZCPYs
LinkedIn: https://www.linkedin.com/in/chong-min-tan-94652288/
Online AI blog: https://delvingintotech.wordpress.com/.
Twitter: https://twitter.com/johntanchongmin
Try out my games here: https://simmer.io/@chongmin