DeepSeek R1 Theory Tutorial – Architecture, GRPO, KL Divergence

Subscribers:
10,700,000
Published on ● Video Link: https://www.youtube.com/watch?v=K34gBCjzni8



Duration: 0:00
25,458 views
920


Learn about DeepSeek R1's innovative AI architecture from ‪@deeplearningexplained‬. The course explores how R1 achieves exceptional reasoning through reinforcement learning, focusing on Group Relative Policy Optimization (GRPO) and how it improves upon traditional PPO methods. You'll also understand KL divergence's role in model stability, with practical code examples and clear mathematical explanations.

❤ ️ Try interactive AI courses we love, right in your browserhttps://scrimba.com/freeCodeCamp-AIAI (Made possible by a grant from our friends at Scrimba)

Contents
⌨ ️ (0:00:00) Introduction
⌨ ️ (0:01:49) R1 Overview - Overview
⌨ ️ (0:03:52) R1 Overview - DeepSeek R1-zero path
⌨ ️ (0:05:32) R1 Overview - Reinforcement learning setup
⌨ ️ (0:08:36) R1 Overview - Group Relative Policy Optimization (GRPO)
⌨ ️ (0:13:04) R1 Overview - DeepSeek R1-zero result
⌨ ️ (0:16:53) R1 Overview - Cold start supervised fine-tuning
⌨ ️ (0:17:44) R1 Overview - Consistency reward for CoT
⌨ ️ (0:18:35) R1 Overview - Supervised Fine tuning data generation
⌨ ️ (0:21:06) R1 Overview - Reinforcement learning with neural reward model
⌨ ️ (0:22:53) R1 Overview - Distillation
⌨ ️ (0:26:16) GRPO - Overview
⌨ ️ (0:26:55) GRPO - PPO vs GRPO
⌨ ️ (0:30:25) GRPO - PPO formula overview
⌨ ️ (0:33:25) GRPO - GRPO formula overview
⌨ ️ (0:36:48) GRPO - GRPO pseudo code
⌨ ️ (0:38:56) GRPO - GRPO Trainer code
⌨ ️ (0:49:24) KL Divergence - Overview
⌨ ️ (0:49:55) KL Divergence - KL Divergence in GRPO vs PPO
⌨ ️ (0:51:20) KL Divergence - KL Divergence refresher
⌨ ️ (0:55:32) KL Divergence - Monte Carlo estimation of KL divergence
⌨ ️ (0:56:43) KL Divergence - Schulman blog
⌨ ️ (0:57:38) KL Divergence - k1 = log(q/p)
⌨ ️ (1:00:01) KL Divergence - k2 = 0.5*log(p/q)^2
⌨ ️ (1:02:19) KL Divergence - k3 = (p/q - 1) - log(p/q)
⌨ ️ (1:04:44) KL Divergence - benchmarking
⌨ ️ (1:07:28) Conclusion


🎉 Thanks to our Champion and Sponsor supporters:
👾 Drake Milly
👾 Ulises Moralez
👾 Goddard Tan
👾 David MG
👾 Matthew Springman
👾 Claudio
👾 Oscar R.
👾 jedi-or-sith
👾 Nattira Maneerat
👾 Justin Huhttps://www.freecodecamp.org/ee and get a developer job: https://www.freehttps://freecodecamp.org/newsf articles on programming: https://freecodecamp.org/news




Other Videos By freeCodeCamp.org


2025-04-01Code DeepSeek V3 From Scratch in Python - Full Course
2025-03-28From broke musician to working dev. How college drop-out Ryan Furrer learned to code [Podcast #166]
2025-03-27Excel Formulas & Functions You Should Know [Full Course]
2025-03-25Microservices in Nest.js – JavaScript Tutorial
2025-03-21From hating coding to programming satellites at age 37 – Francesco Ciulla interview [Podcast #165]
2025-03-19Learn ANY Language with AI (Learn English, Learn Spanish, Learn Mandarin Chinese, and more)
2025-03-18Build a Full Stack AI Note Taking App with Next.js and Supabase – Tutorial
2025-03-14How to become a self-taught developer while supporting a family [Podcast #164]
2025-03-13AWS Cognito Course – Authentication and Authorization
2025-03-12JavaScript Essentials Course
2025-03-11DeepSeek R1 Theory Tutorial – Architecture, GRPO, KL Divergence
2025-03-07Learn fewer skills but go deeper - the Caleb Curry interview [Podcast #163]
2025-03-06Learn PyTorch in 5 Projects – Tutorial
2025-03-05Intro to Machine Learning featuring Generative AI
2025-03-04Unity Tutorial – Massive Multiplayer Online (MMO) Game with SpacetimeDB
2025-02-28How to become a developer in your 30s with Anjana Vakil [Podcast #162]
2025-02-27Linear Algebra for Machine Learning
2025-02-26Build a Full Stack AI-Powered Web App with ChatGPT API
2025-02-25Vision Transformer from Scratch Tutorial
2025-02-23How to go full-on Renaissance Man mode in 2025 with Vaughn Gene [Podcast #161]
2025-02-20Kubernetes and EKS for Beginners – Crash Course with Pulumi