How ChatGPT works - From Transformers to Reinforcement Learning with Human Feedback (RLHF)
ChatGPT has recently been released by OpenAI, and it is fundamentally a next token/word prediction model. Given the prompt, predict the next token/word(s). When trained on a massive internet corpus, it manages to be very powerful and can do many tasks like summarization, code completion, question and answer zero-shot.
Amidst the hype of ChatGPT, it can be easy to assume that the model can reason and think for itself. Here, we try to demystify how the model works, first starting with a basic introduction of Transformers, and then how we can improve the model's output using Reinforcement Learning with Human Feedback (RLHF).
Slides and code here: https://github.com/tanchongmin/TensorFlow-Implementations
ChatGPT with plugins/tools/APIs here: https://www.youtube.com/watch?v=J1Xj0xXmtHU
Transformer Introduction here: https://www.youtube.com/watch?v=iBamMr2WEsQ
References:
Original Transformer Paper (Attention is all you need): https://arxiv.org/pdf/1706.03762.pdf
GPT Paper: https://arxiv.org/pdf/2005.14165.pdf
DialoGPT Paper (conversational AI by Microsoft): https://arxiv.org/pdf/1911.00536.pdf
InstructGPT Paper (with RLHF): https://arxiv.org/pdf/2203.02155.pdf
Illustrated Transformer: https://jalammar.github.io/illustrated-transformer/
Illustrated GPT-2: https://jalammar.github.io/illustrated-gpt2/
0:00 Introduction
3:09 Embedding Space
15:35 Overall Transformer Architecture
36:06 Transformer (Details)
49:28 GPT Architecture
56:38 GPT Training and Loss Function
1:05:25 Live Demo of GPT Next Token Generation and Attention Visualisation
1:16:55 Conversational AI
1:19:00 Reinforcement Learning from Human Feedback (RLHF)
1:45:15 Discussion
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`
AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.
Discord: https://discord.gg/fXCZCPYs
Online AI blog: https://delvingintotech.wordpress.com/.
LinkedIn: https://www.linkedin.com/in/chong-min-tan-94652288/
Twitch: https://www.twitch.tv/johncm99
Twitter: https://twitter.com/johntanchongmin
Try out my games here: https://simmer.io/@chongmin