PhD Thesis Overview (Part 1): Reward is not enough; Towards Goal-Directed, Memory-based Learning

Channel:

John Tan Chong Min

Subscribers:

5,450

Published on January 20, 2025 7:29:33 AM ● Video Link: https://www.youtube.com/watch?v=2tQl-N35DBM

Duration: 0:00

1,324 views

Going through the key insights from my 5 years of PhD next Monday.

The three main takeaways are:
Reward-based learning is slow to learn and slow to adapt to changes in environment / reward
Goal-directed, memory-based learning learns very quickly and outperforms reward-based learning
Adding in Large Langauge Models (LLMs) with suitable abstraction spaces into such a goal-directed, memory-based learning system can utilise pre-built knowledge and learn even faster (if test environment is within training dataset of the LLMs)

Abstract:
Humans excel at fast and adaptive learning, effortlessly making zero-shot associations and generalising across diverse environments with minimal experience needed. This is in stark contrast to data-hungry deep-learning algorithms. This work aims to draw inspiration from human cognitive processes to build AI systems that learn and adapt quickly.

We introduce Learning, Fast and Slow (Best Paper Finalist in IEEE ICDL 2023), a system which uses a neural network to perform goal-directed exploration (the “fast” mechanism), and additionally performs memory-based planning (the “slow” mechanism). Trained online via memory replay in a self-supervised fashion, this method achieves a 91.9% solve rate in a dynamically changing 10x10 maze, significantly better than actor-critic methods like PPO (61.2%), TRPO (26.1%), A2C (23.9%).

We also utilise a similar memory-based, goal-directed approach to create an open-sourced Large Language Model-based agentic framework, TaskGen. This will continue to be developed under AgentJo ( https://github.com/tanchongmin/agentjo )

~~~
Brick Tic Tac Toe Game (Level 2.2): https://simmer.io/@chongmin/cosmic-tic-tac-toe

Reference Papers / Video:
Part 2:    • PhD Thesis Overview (Part 2): LLMs fo...
DropNet: https://arxiv.org/abs/2207.06646
Brick Tic Tac Toe: https://arxiv.org/abs/2207.05991
Hippocampal Replay (NeurIPS memARI workshop 2022): https://memari-workshop.github.io/papers/paper_38.pdf
Video:    • Hippocampal Replay for Learning (3 mi...

Learning, Fast and Slow: https://ieeexplore.ieee.org/abstract/document/10364540
https://arxiv.org/pdf/2301.13758
Video:    • Learning, Fast and Slow: My Landmark ...

LLMs as a system of multiple expert agents: https://ieeecai.org/2024/wp-content/pdfs/540900a793/540900a793.pdf
https://arxiv.org/pdf/2310.05146
Video:    • LLMs as a System of Multiple Expert A...

TaskGen: https://arxiv.org/pdf/2407.15734
Video:    • TaskGen - A Task-based Agentic Framew...

AgentJo: https://github.com/tanchongmin/agentjo

~~~

0:00 Introduction
4:24 Overview of Insights
14:38 DropNet: Learning by Pruning
17:26 Brick Tic Tac Toe: Reward is not enough
39:56 Hippocampal Replay: Memory-based Learning
52:15 Learning, Fast and Slow: Goal-Directed, Memory-based Learning
1:25:18 Question: Can AI be made to align to be good for humanity?
1:31:10 Learning, Fast and Slow Empirical Results
1:35:50 Teaser for Future Session
1:37:50 Memory Abstraction Space Discussion

~~~

AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.

Discord: https://discord.gg/bzp87AHJy5
LinkedIn: https://www.linkedin.com/in/chong-min-tan-94652288/
Online AI blog: https://delvingintotech.wordpress.com/
Twitter: https://twitter.com/johntanchongmin
Try out my games here: https://simmer.io/@chongmin

Other Videos By John Tan Chong Min

2025-06-16	Universal Filter (Part 2): Time, Akashic Records, Individual Mind-based, Body-based memory
2025-06-04	Good Vibes Only with Dylan Chia: Lyria (Music), Veo3 (Video), Gamma (Slides), GitHub Copilot (Code)
2025-03-10	Memory Meets Psychology - Claude Plays Pokemon: How It works, How to improve it
2025-02-24	Vibe Coding: How to use LLM prompts to code effectively!
2025-01-26	PhD Thesis Overview (Part 2): LLMs for ARC-AGI, Task-Based Memory-Infused Learning, Plan for AgentJo
2025-01-20	PhD Thesis Overview (Part 1): Reward is not enough; Towards Goal-Directed, Memory-based Learning
2024-12-04	AgentJo CV Generator: Generate your CV by searching for your profile on the web!
2024-11-11	Can LLMs be used in self-driving? CoMAL: Collaborative Multi-Agent LLM for Mixed Autonomy Traffic
2024-10-28	From TaskGen to AgentJo: Creating My Life Dream of Fast Learning and Adaptable Agents
2024-10-21	Tian Yu X John: Discussing Practical Gen AI Tips for Image Prompting
2024-10-08	Jiafei Duan: Uncovering the 'Right' Representations for Multimodal LLMs for Robotics
2024-09-27	TaskGen Tutorial 6: Conversation Wrapper
2024-09-26	TaskGen Tutorial 5: External Functions & CodeGen
2024-09-24	TaskGen Tutorial 4: Hierarchical Agents
2024-09-23	TaskGen Tutorial 3: Memory
2024-09-19	TaskGen Tutorial 2: Shared Variables and Global Context
2024-09-16	Beyond Strawberry: gpt-o1 - Is LLM alone sufficient for reasoning?
2024-09-11	TaskGen Tutorial 1: Agents and Equipped Functions
2024-09-11	TaskGen Tutorial 0: StrictJSON
2024-09-10	LLM-Modulo: Using Critics and Verifiers to Improve Grounding of a Plan - Explanation + Improvements
2024-09-06	TaskGen: Co-create the best open-sourced LLM Agentic Framework together!

Channel	Latest
Scott Jund	6 hours ago
Smutsen	6 hours ago
BeastyqtSC2	6 hours ago
Exalted	6 hours ago
Bonkol Live	6 hours ago
Teh Spearhead	6 hours ago
Ashe Challenger	6 hours ago
Austinmp88	6 hours ago
Ask About Parenting & Care	6 hours ago
GranaDy	7 hours ago
Catninja909	7 hours ago
Sion VOD Gaming	7 hours ago
mlodyhubson	7 hours ago
Outplanet Studios	7 hours ago
RakuInariLP	7 hours ago
Xmilek62	7 hours ago
BranOnline	7 hours ago
ketsueki_randi	7 hours ago
beavsbaut	7 hours ago
JugZone	7 hours ago
PIMPNITE	7 hours ago
ItzMiketheman	7 hours ago
Secretnc	7 hours ago
Jeisonlk	7 hours ago
Kaghoegaming	7 hours ago