R-Zero: Self-Evolving Reasoning LLM from Zero Data

Channel:

John Tan Chong Min

Subscribers:

6,300

Published on August 18, 2025 3:05:48 PM ● Video Link: https://www.youtube.com/watch?v=8wyjQA-I_AQ

Duration: 0:00

685 views

Can LLMs propose their own curriculum and learn from their own curriculum?

R-Zero is an exciting new approach testing this.

Empirically, R-Zero substantially improves reasoning capability across different backbone LLMs, e.g., boosting the Qwen3-4B-Base by +6.49 on math reasoning benchmarks, and +7.54 on general-domain reasoning benchmarks.

How it works:
It pits an LLM Challenger (teacher) against the LLM Solver (student), and iteratively improves each one.
The Challenger proposes harder questions each iteration that stumps the Solver 50% of the time.
The Solver then ups its game and learns to answer the questions according to the majority consensus answer.
Repeat and (hopefully) get better Solvers!

My take:
The idea is interesting, but lacks an objective ground truth. Perhaps having some expert data is still necessary after all. However, the idea of iterative improvement is a very nice one to understand.

~~~

Slides: https://github.com/tanchongmin/john-youtube/blob/main/Discussion_Sessions/R-Zero Slides.pdf
Paper: https://www.alphaxiv.org/abs/2508.05004
Code: https://github.com/Chengsong-Huang/R-Zero

Related References:
DeepSeek R1 (Introducing GRPO): https://arxiv.org/pdf/2501.12948
AlphaGo Zero: https://deepmind.google/discover/blog/alphago-zero-starting-from-scratch/
GANs: https://arxiv.org/pdf/1406.2661

~~~

0:00 Introduction
1:48 Illustrative Example
11:40 Co-evolving Challenger and Solver
13:32 Impressive Performance Gains
15:39 Group Relative Policy Optimisation (GRPO)
31:19 Iterative Improvement of Challenger and Solver
34:46 Step 1: Training the Challenger
47:10 Step 2: Training the Solver
49:59 Results and Insights
1:04:54 Comparison to Voyager
1:06:40 Discussion

~~~

AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.

Discord: https://discord.gg/bzp87AHJy5
LinkedIn: https://www.linkedin.com/in/chong-min-tan-94652288/
Online AI blog: https://delvingintotech.wordpress.com/
Twitter: https://twitter.com/johntanchongmin
Try out my games here: https://simmer.io/@chongmin

Other Videos By John Tan Chong Min

2025-09-08	DINOv3: One backbone, multiple image/video tasks
2025-08-18	R-Zero: Self-Evolving Reasoning LLM from Zero Data
2025-08-11	Reasoning without Language (Part 2) - Deep Dive into 27 mil parameter Hierarchical Reasoning Model
2025-08-04	Reasoning without Language - Deep Dive into 27 mil parameter Hierarchical Reasoning Model
2025-07-28	No need for symbolic programs for Math? Natural language approach to IMO
2025-07-21	How many instructions can LLMs follow at once?
2025-07-15	Arjo Chakravarty: Indoor Localisation with Visual Language Models (VLMs)
2025-07-14	MemOS: A Paradigm Shift to Memory as a First Class Citizen for LLMs
2025-07-07	Multimodal Query for Images: Text/Image Multimodal Query with Negative Filter and Folder Selection
2025-06-30	Universal Filter (Part 4 - Finale): Knowledge/Memory, Reflection, Communication between Individuals
2025-06-23	Universal Filter (Part 3): Learning the Filters, Universal Database, Individual Knowledge Base
2025-06-16	Universal Filter (Part 2): Time, Akashic Records, Individual Mind-based, Body-based memory
2025-06-04	Good Vibes Only with Dylan Chia: Lyria (Music), Veo3 (Video), Gamma (Slides), GitHub Copilot (Code)
2025-03-10	Memory Meets Psychology - Claude Plays Pokemon: How It works, How to improve it
2025-02-24	Vibe Coding: How to use LLM prompts to code effectively!
2025-01-26	PhD Thesis Overview (Part 2): LLMs for ARC-AGI, Task-Based Memory-Infused Learning, Plan for AgentJo
2025-01-20	PhD Thesis Overview (Part 1): Reward is not enough; Towards Goal-Directed, Memory-based Learning
2024-12-04	AgentJo CV Generator: Generate your CV by searching for your profile on the web!
2024-11-11	Can LLMs be used in self-driving? CoMAL: Collaborative Multi-Agent LLM for Mixed Autonomy Traffic
2024-10-28	From TaskGen to AgentJo: Creating My Life Dream of Fast Learning and Adaptable Agents
2024-10-21	Tian Yu X John: Discussing Practical Gen AI Tips for Image Prompting

Channel	Latest
SincerelyLyn	7 hours ago
Slay With Brandy	7 hours ago
ozma	8 hours ago
SA Smash TV	8 hours ago
animense con todo	8 hours ago
Tapuy Schatzi	8 hours ago
Omar UX-UI Designer	8 hours ago
ahmdfaiqrsnizal	8 hours ago
World Gamers	8 hours ago
COSMO	8 hours ago
Tongbos_EN	8 hours ago
Ota of TCG	8 hours ago
Dalerpot Game Clips	8 hours ago
YouAintGotaPs5Yet?	8 hours ago
ForFor	8 hours ago
Alex Spider [スパイダー]	8 hours ago
AL Han	9 hours ago
ToraraGo	9 hours ago
GibranVerse ID	9 hours ago
CharizardSonic	9 hours ago
mol sai	9 hours ago
FoldGaming	10 hours ago
FAKTA DUNIA	10 hours ago
Tavares	10 hours ago
FrostPlayz65	10 hours ago