DeepSeek R1 Theory Tutorial – Architecture, GRPO, KL Divergence

Channel:

freeCodeCamp.org

Subscribers:

10,900,000

Published on March 11, 2025 3:57:40 PM ● Video Link: https://www.youtube.com/watch?v=K34gBCjzni8

Duration: 0:00

25,458 views

920

Learn about DeepSeek R1's innovative AI architecture from ‪@deeplearningexplained‬. The course explores how R1 achieves exceptional reasoning through reinforcement learning, focusing on Group Relative Policy Optimization (GRPO) and how it improves upon traditional PPO methods. You'll also understand KL divergence's role in model stability, with practical code examples and clear mathematical explanations.

❤ ️ Try interactive AI courses we love, right in your browserhttps://scrimba.com/freeCodeCamp-AIAI (Made possible by a grant from our friends at Scrimba)

Contents
⌨ ️ (0:00:00) Introduction
⌨ ️ (0:01:49) R1 Overview - Overview
⌨ ️ (0:03:52) R1 Overview - DeepSeek R1-zero path
⌨ ️ (0:05:32) R1 Overview - Reinforcement learning setup
⌨ ️ (0:08:36) R1 Overview - Group Relative Policy Optimization (GRPO)
⌨ ️ (0:13:04) R1 Overview - DeepSeek R1-zero result
⌨ ️ (0:16:53) R1 Overview - Cold start supervised fine-tuning
⌨ ️ (0:17:44) R1 Overview - Consistency reward for CoT
⌨ ️ (0:18:35) R1 Overview - Supervised Fine tuning data generation
⌨ ️ (0:21:06) R1 Overview - Reinforcement learning with neural reward model
⌨ ️ (0:22:53) R1 Overview - Distillation
⌨ ️ (0:26:16) GRPO - Overview
⌨ ️ (0:26:55) GRPO - PPO vs GRPO
⌨ ️ (0:30:25) GRPO - PPO formula overview
⌨ ️ (0:33:25) GRPO - GRPO formula overview
⌨ ️ (0:36:48) GRPO - GRPO pseudo code
⌨ ️ (0:38:56) GRPO - GRPO Trainer code
⌨ ️ (0:49:24) KL Divergence - Overview
⌨ ️ (0:49:55) KL Divergence - KL Divergence in GRPO vs PPO
⌨ ️ (0:51:20) KL Divergence - KL Divergence refresher
⌨ ️ (0:55:32) KL Divergence - Monte Carlo estimation of KL divergence
⌨ ️ (0:56:43) KL Divergence - Schulman blog
⌨ ️ (0:57:38) KL Divergence - k1 = log(q/p)
⌨ ️ (1:00:01) KL Divergence - k2 = 0.5*log(p/q)^2
⌨ ️ (1:02:19) KL Divergence - k3 = (p/q - 1) - log(p/q)
⌨ ️ (1:04:44) KL Divergence - benchmarking
⌨ ️ (1:07:28) Conclusion

🎉 Thanks to our Champion and Sponsor supporters:
👾 Drake Milly
👾 Ulises Moralez
👾 Goddard Tan
👾 David MG
👾 Matthew Springman
👾 Claudio
👾 Oscar R.
👾 jedi-or-sith
👾 Nattira Maneerat
👾 Justin Huhttps://www.freecodecamp.org/ee and get a developer job: https://www.freehttps://freecodecamp.org/newsf articles on programming: https://freecodecamp.org/news

Other Videos By freeCodeCamp.org

2025-04-01	Code DeepSeek V3 From Scratch in Python - Full Course
2025-03-28	From broke musician to working dev. How college drop-out Ryan Furrer learned to code [Podcast #166]
2025-03-27	Excel Formulas & Functions You Should Know [Full Course]
2025-03-25	Microservices in Nest.js – JavaScript Tutorial
2025-03-21	From hating coding to programming satellites at age 37 – Francesco Ciulla interview [Podcast #165]
2025-03-19	Learn ANY Language with AI (Learn English, Learn Spanish, Learn Mandarin Chinese, and more)
2025-03-18	Build a Full Stack AI Note Taking App with Next.js and Supabase – Tutorial
2025-03-14	How to become a self-taught developer while supporting a family [Podcast #164]
2025-03-13	AWS Cognito Course – Authentication and Authorization
2025-03-12	JavaScript Essentials Course
2025-03-11	DeepSeek R1 Theory Tutorial – Architecture, GRPO, KL Divergence
2025-03-07	Learn fewer skills but go deeper - the Caleb Curry interview [Podcast #163]
2025-03-06	Learn PyTorch in 5 Projects – Tutorial
2025-03-05	Intro to Machine Learning featuring Generative AI
2025-03-04	Unity Tutorial – Massive Multiplayer Online (MMO) Game with SpacetimeDB
2025-02-28	How to become a developer in your 30s with Anjana Vakil [Podcast #162]
2025-02-27	Linear Algebra for Machine Learning
2025-02-26	Build a Full Stack AI-Powered Web App with ChatGPT API
2025-02-25	Vision Transformer from Scratch Tutorial
2025-02-23	How to go full-on Renaissance Man mode in 2025 with Vaughn Gene [Podcast #161]
2025-02-20	Kubernetes and EKS for Beginners – Crash Course with Pulumi

Channel	Latest
Corle1	6 hours ago
Jeff Dye	6 hours ago
OkamiYurei	6 hours ago
KHOVTE	6 hours ago
Combo Panda	6 hours ago
SOLDIER H4X	6 hours ago
OneShot2Shot313	6 hours ago
Suzy Lu	6 hours ago
GamerJGB	6 hours ago
Khono Chronos (Ban'orak)	6 hours ago
Alibabav8 Games	6 hours ago
SMITE by Titan Forge Games	7 hours ago
trashmand	7 hours ago
shroud	7 hours ago
Chilluminati Podcast	7 hours ago
tvgry	7 hours ago
Bad Quality Gaming	7 hours ago
Eric Kurosaki	7 hours ago
Saharul YT	7 hours ago
Bimmo	7 hours ago
Freezie	7 hours ago
CaistLP	7 hours ago
BlueDemonofFire	7 hours ago
ChratosGameplay	7 hours ago
JuansGotThis	7 hours ago