R-Zero: Self-Evolving Reasoning LLM from Zero Data
Can LLMs propose their own curriculum and learn from their own curriculum?
R-Zero is an exciting new approach testing this.
Empirically, R-Zero substantially improves reasoning capability across different backbone LLMs, e.g., boosting the Qwen3-4B-Base by +6.49 on math reasoning benchmarks, and +7.54 on general-domain reasoning benchmarks.
How it works:
It pits an LLM Challenger (teacher) against the LLM Solver (student), and iteratively improves each one.
The Challenger proposes harder questions each iteration that stumps the Solver 50% of the time.
The Solver then ups its game and learns to answer the questions according to the majority consensus answer.
Repeat and (hopefully) get better Solvers!
My take:
The idea is interesting, but lacks an objective ground truth. Perhaps having some expert data is still necessary after all. However, the idea of iterative improvement is a very nice one to understand.
~~~
Slides: https://github.com/tanchongmin/john-youtube/blob/main/Discussion_Sessions/R-Zero Slides.pdf
Paper: https://www.alphaxiv.org/abs/2508.05004
Code: https://github.com/Chengsong-Huang/R-Zero
Related References:
DeepSeek R1 (Introducing GRPO): https://arxiv.org/pdf/2501.12948
AlphaGo Zero: https://deepmind.google/discover/blog/alphago-zero-starting-from-scratch/
GANs: https://arxiv.org/pdf/1406.2661
~~~
0:00 Introduction
1:48 Illustrative Example
11:40 Co-evolving Challenger and Solver
13:32 Impressive Performance Gains
15:39 Group Relative Policy Optimisation (GRPO)
31:19 Iterative Improvement of Challenger and Solver
34:46 Step 1: Training the Challenger
47:10 Step 2: Training the Solver
49:59 Results and Insights
1:04:54 Comparison to Voyager
1:06:40 Discussion
~~~
AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.
Discord: https://discord.gg/bzp87AHJy5
LinkedIn: https://www.linkedin.com/in/chong-min-tan-94652288/
Online AI blog: https://delvingintotech.wordpress.com/
Twitter: https://twitter.com/johntanchongmin
Try out my games here: https://simmer.io/@chongmin