PhD Thesis Overview (Part 1): Reward is not enough; Towards Goal-Directed, Memory-based Learning
Going through the key insights from my 5 years of PhD next Monday.
The three main takeaways are:
Reward-based learning is slow to learn and slow to adapt to changes in environment / reward
Goal-directed, memory-based learning learns very quickly and outperforms reward-based learning
Adding in Large Langauge Models (LLMs) with suitable abstraction spaces into such a goal-directed, memory-based learning system can utilise pre-built knowledge and learn even faster (if test environment is within training dataset of the LLMs)
Abstract:
Humans excel at fast and adaptive learning, effortlessly making zero-shot associations and generalising across diverse environments with minimal experience needed. This is in stark contrast to data-hungry deep-learning algorithms. This work aims to draw inspiration from human cognitive processes to build AI systems that learn and adapt quickly.
We introduce Learning, Fast and Slow (Best Paper Finalist in IEEE ICDL 2023), a system which uses a neural network to perform goal-directed exploration (the “fast” mechanism), and additionally performs memory-based planning (the “slow” mechanism). Trained online via memory replay in a self-supervised fashion, this method achieves a 91.9% solve rate in a dynamically changing 10x10 maze, significantly better than actor-critic methods like PPO (61.2%), TRPO (26.1%), A2C (23.9%).
We also utilise a similar memory-based, goal-directed approach to create an open-sourced Large Language Model-based agentic framework, TaskGen. This will continue to be developed under AgentJo ( https://github.com/tanchongmin/agentjo )
~~~
Brick Tic Tac Toe Game (Level 2.2): https://simmer.io/@chongmin/cosmic-tic-tac-toe
Reference Papers / Video:
Part 2: • PhD Thesis Overview (Part 2): LLMs fo...
DropNet: https://arxiv.org/abs/2207.06646
Brick Tic Tac Toe: https://arxiv.org/abs/2207.05991
Hippocampal Replay (NeurIPS memARI workshop 2022): https://memari-workshop.github.io/papers/paper_38.pdf
Video: • Hippocampal Replay for Learning (3 mi...
Learning, Fast and Slow: https://ieeexplore.ieee.org/abstract/document/10364540
https://arxiv.org/pdf/2301.13758
Video: • Learning, Fast and Slow: My Landmark ...
LLMs as a system of multiple expert agents: https://ieeecai.org/2024/wp-content/pdfs/540900a793/540900a793.pdf
https://arxiv.org/pdf/2310.05146
Video: • LLMs as a System of Multiple Expert A...
TaskGen: https://arxiv.org/pdf/2407.15734
Video: • TaskGen - A Task-based Agentic Framew...
AgentJo: https://github.com/tanchongmin/agentjo
~~~
0:00 Introduction
4:24 Overview of Insights
14:38 DropNet: Learning by Pruning
17:26 Brick Tic Tac Toe: Reward is not enough
39:56 Hippocampal Replay: Memory-based Learning
52:15 Learning, Fast and Slow: Goal-Directed, Memory-based Learning
1:25:18 Question: Can AI be made to align to be good for humanity?
1:31:10 Learning, Fast and Slow Empirical Results
1:35:50 Teaser for Future Session
1:37:50 Memory Abstraction Space Discussion
~~~
AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.
Discord: https://discord.gg/bzp87AHJy5
LinkedIn: https://www.linkedin.com/in/chong-min-tan-94652288/
Online AI blog: https://delvingintotech.wordpress.com/
Twitter: https://twitter.com/johntanchongmin
Try out my games here: https://simmer.io/@chongmin