How ChatGPT works - From Transformers to Reinforcement Learning with Human Feedback (RLHF)

Channel:

John Tan Chong Min

Subscribers:

5,470

Published on January 17, 2023 12:39:18 PM ● Video Link: https://www.youtube.com/watch?v=wA8rjKueB3Q

Duration: 2:14:29

15,553 views

356

ChatGPT has recently been released by OpenAI, and it is fundamentally a next token/word prediction model. Given the prompt, predict the next token/word(s). When trained on a massive internet corpus, it manages to be very powerful and can do many tasks like summarization, code completion, question and answer zero-shot.

Amidst the hype of ChatGPT, it can be easy to assume that the model can reason and think for itself. Here, we try to demystify how the model works, first starting with a basic introduction of Transformers, and then how we can improve the model's output using Reinforcement Learning with Human Feedback (RLHF).

Slides and code here: https://github.com/tanchongmin/TensorFlow-Implementations

ChatGPT with plugins/tools/APIs here: https://www.youtube.com/watch?v=J1Xj0xXmtHU
Transformer Introduction here: https://www.youtube.com/watch?v=iBamMr2WEsQ

References:
Original Transformer Paper (Attention is all you need): https://arxiv.org/pdf/1706.03762.pdf
GPT Paper: https://arxiv.org/pdf/2005.14165.pdf
DialoGPT Paper (conversational AI by Microsoft): https://arxiv.org/pdf/1911.00536.pdf
InstructGPT Paper (with RLHF): https://arxiv.org/pdf/2203.02155.pdf

Illustrated Transformer: https://jalammar.github.io/illustrated-transformer/
Illustrated GPT-2: https://jalammar.github.io/illustrated-gpt2/

0:00 Introduction
3:09 Embedding Space
15:35 Overall Transformer Architecture
36:06 Transformer (Details)
49:28 GPT Architecture
56:38 GPT Training and Loss Function
1:05:25 Live Demo of GPT Next Token Generation and Attention Visualisation
1:16:55 Conversational AI
1:19:00 Reinforcement Learning from Human Feedback (RLHF)
1:45:15 Discussion

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`

AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.

Discord: https://discord.gg/fXCZCPYs
Online AI blog: https://delvingintotech.wordpress.com/.
LinkedIn: https://www.linkedin.com/in/chong-min-tan-94652288/
Twitch: https://www.twitch.tv/johncm99
Twitter: https://twitter.com/johntanchongmin
Try out my games here: https://simmer.io/@chongmin

Other Videos By John Tan Chong Min

2023-04-04	Creating JARVIS: ChatGPT + APIs - HuggingGPT, Memory-Augmented Context, Meta GPT structures
2023-04-02	Is GPT4 capable of self-improving? Are we heading for AGI or AI doom?
2023-03-28	How Visual ChatGPT works + Toolformer/Wolfram Alpha. LLMs with Tools/APIs/Plugins is the way ahead!
2023-03-21	Tokenize any input, even continuous vectors! - Residual Vector Quantization - VALL-E (Part 2)
2023-03-07	Using Transformers to mimic anyone's voice! - VALL-E (Part 1)
2023-02-28	Learning Part-Whole Structure by Chunking - More Efficient than Deep Learning!!!
2023-02-21	High-level planning with large language models - SayCan
2023-02-13	Learning, Fast and Slow: Towards Fast and Adaptable Agents in Changing Environments
2023-02-07	Using Logic Gates as Neurons - Deep Differentiable Logic Gate Networks!
2023-01-31	Learn from External Memory, not just Weights: Large-Scale Retrieval for Reinforcement Learning
2023-01-17	How ChatGPT works - From Transformers to Reinforcement Learning with Human Feedback (RLHF)
2023-01-09	HyperTree Proof Search - Automated Theorem Proving with AlphaZero and Transformers!
2022-12-23	CodinGame Fall Challenge 2022: A First Look (managed to get to Silver!)
2022-12-21	Can ChatGPT solve CodinGame/Google Kickstart problems?
2022-12-19	Reinforcement Learning Fast and Slow: Goal-Directed and Memory Retrieval Mechanism!
2022-12-12	A New Framework of Memory for Learning (Part 1)
2022-11-14	Hippocampal Replay for Learning (Full Length with Questions)
2022-11-14	Hippocampal Replay for Learning (3 min summary)
2022-11-07	AlphaTensor: Using Reinforcement Learning for Efficient Matrix Multiplication
2022-10-27	Playing Go on TyGem and learning from AI (~ 3 kyu)
2022-10-13	Heroes of Might and Magic III - Armageddon's Blade Campaign (First Playthrough) - Final!!!

Channel	Latest
Wandi Tutorial	6 hours ago
Diário de Bordo	6 hours ago
The Box Man	6 hours ago
Stumpt	6 hours ago
Virtual Space Games	7 hours ago
SyuraStream	7 hours ago
ChaosDragon	7 hours ago
Modsunk Channel	7 hours ago
Bettypvp	7 hours ago
Yahweasel	7 hours ago
Avenger-senpai	7 hours ago
ChariGemu	7 hours ago
Trenlass	7 hours ago
Anton gamer	7 hours ago
フリーランスなおきち広島弁ゲーム実況	7 hours ago
Pokey	7 hours ago
Scott Kujawa	7 hours ago
UITYGER	8 hours ago
MRSyonicBoom	8 hours ago
Free Fire Esports Brasil #FREEFIRE	8 hours ago
Skar Productions	8 hours ago
death king	8 hours ago
AndromalicPlay1337	8 hours ago
HuzzyGames	8 hours ago
Wanderbots	8 hours ago