I have been thinking about the problem of creating fast and adaptable agents for 4 years. Recently, I had a breakthrough of using goal-directed means and imbuing memory to solve it. Here's the video documenting the insights.
I will be presenting this at the IEEE International Conference on Development and Learning (ICDL) 2023 from 9-11 Nov 2023 at Macau! Very keen to continue to work on this as this is part of the 10-year plan I have to create fast and adaptable agents!
**Key Insights:**
- Use Goal-directed action prediction, so we can do self-supervised learning on our trajectories (Given start state, goal state, predict first action)
- Use memory to model (state, action, next state) tuples and use it for world modelling and transition probability approximation
- Memory learns almost immediately, while neural network takes time to learn. Moreover, memory is used for lookahead planning. Hence, if we can use memory to find a path from start state to goal state, we will use that path rather than the neural network.
- Performs very well (91.9% solve rate) compared to next-best RL algorithm Proximal Policy Optimisation (61.2% solve rate) in a 10x10 dynamic grid environment.
AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.