No need for symbolic programs for Math? Natural language approach to IMO
International Mathematical Olympiad (IMO) is an international mathematical competition which challenges participants with exceptionally difficult problems in fields like algebra, number theory and combinatorics.
Previously, LLM-based approaches have conquered Math benchmarks like GSM8K and AIME, but have only attained Silver medal performance at IMO.
To solve IMO problems, there needs to be multi-step reasoning and creative innovation to think beyond the norm.
OpenAI and Gemini have claimed to attain the Gold level performance at IMO 2025, with Gemini being officially verified.
Here, let us take a look at how two researchers, Yichen Huang and Lin F. Yang managed to attain the Gold level performance as well.
They used the LLM as a pipeline to generate solutions, verify them and self-improve the solutions.
It is amazing as previously I thought that a robust verifier was needed for Math.
Apparently, if the LLM is well trained on Math datasets, you can use the LLM as a verifier directly.
~~~
Links:
Slides: https://github.com/tanchongmin/john-youtube/blob/main/Discussion_Sessions/IMO_Gemini.pdf
Paper: https://www.alphaxiv.org/pdf/2507.15855
Code: https://github.com/lyang36/IMO25
Other References:
Gemini IMO Gold: https://deepmind.google/discover/blog/advanced-version-of-gemini-with-deep-think-officially-achieves-gold-medal-standard-at-the-international-mathematical-olympiad/
Gemini Deep Think: https://blog.google/technology/google-deepmind/google-gemini-updates-io-2025/#deep-think
AlphaGeometry: https://deepmind.google/discover/blog/alphageometry-an-olympiad-level-ai-system-for-geometry/
AlphaProof (Silver-level IMO performance with symbolic solver): https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/
AlphaCode: https://deepmind.google/discover/blog/competitive-programming-with-alphacode/
ARC-AGI Challenge Ryan Greenblatt's "Sample More" solution: https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt
~~~
0:00 Introduction
5:32 Domain-Specific Language Approach
10:26 From DSL to Natural Language
14:53 Deep Think
23:35 Natural Language Approach to Math
56:07 AlphaEvolve
1:00:52 Open-sourced IMO Gold AI Workflow
1:19:12 Detailed Steps
1:28:56 Key Takeway: Verifier is not perfect - but system can still work!
1:33:09 My thoughts
1:55:36 Discussion
2:04:38 Conclusion
~~~
AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.
Discord: https://discord.gg/bzp87AHJy5
LinkedIn: https://www.linkedin.com/in/chong-min-tan-94652288/
Online AI blog: https://delvingintotech.wordpress.com/
Twitter: https://twitter.com/johntanchongmin
Try out my games here: https://simmer.io/@chongmin