Transformers (Part 1)
Illustrating how a Transformer works from the very basics.
Covers intuition behind attention, parallel processing rather than hidden state, embedding space, and encoder structure of a Transformer.
Part 2 here: https://www.youtube.com/watch?v=oq0vj2pLrHQ
ChatGPT video here: https://www.youtube.com/watch?v=wA8rjKueB3Q
Slides can be found on:
https://github.com/tanchongmin/TensorFlow-Implementations
0:00 Introduction
1:43 Find Waldo
4:36 Memory Activity
11:25 Problems with RNNs / MLPs for sequential data
14:58 Transformer as a solution to these problems
18:30 AI Dungeon demonstration
27:55 Word tokenisation
33:18 One-hot encoding
34:18 Embedding space
38:12 Overall Transformer Architecture
41:48 Encoder-Decoder Structure
43:30 Self-Attention Block and how it helps with polysemy
1:00:17 Feed-forward block