GATO by DeepMind: One Transformer to rule them all?
A review of the GATO architecture (a generalist agent) by DeepMind as well as some potential improvements.
Paper link: https://openreview.net/pdf?id=1ikK0kHjvj
Check out earlier videos for more background information on Transformers:
Decision Transformers: https://www.youtube.com/watch?v=AW7vHggnAps
Transformers (Part 1): https://www.youtube.com/watch?v=iBamMr2WEsQ
Transformers (Part 2): https://www.youtube.com/watch?v=oq0vj2pLrHQ
Slides can be found at: https://github.com/tanchongmin/TensorFlow-Implementations
0:00 Background on GATO
4:06 Multi-modal architecture
6:23 GATO Structure
10:11 Prediction Problem (Sequence)
15:03 Loss Function
16:53 Tokenization
19:24 Embedding Inputs
20:26 Image Embedding
22:01 Image + Discrete Actions Embedding
23:05 Proprioception + Continuous Actions Embedding
24:37 Local Position Embeddings
27:44 Training Details
30:13 Datasets
32:20 Training Procedure
36:00 Is the general agent as good as the expert?
38:45 Is GATO scalable?
43:23 Can GATO generalise (zero-shot)?
47:20 Can GATO generalise (few-shot)?
52:31 Discussion
~~~~~~~~~~~~~~~~~~
Discord: https://discord.gg/fXCZCPYs
LinkedIn: https://www.linkedin.com/in/chong-min-tan-94652288/
Online AI blog: https://delvingintotech.wordpress.com/.
Twitter: https://twitter.com/johntanchongmin
Try out my games here: https://simmer.io/@chongmin
Other Videos By John Tan Chong Min
Other Statistics
Gato Statistics For John Tan Chong Min
There are 369 views in 1 video for Gato. About an hours worth of Gato videos were uploaded to his channel, less than 0.35% of the total video content that John Tan Chong Min has uploaded to YouTube.