How many instructions can LLMs follow at once?
As LLMs are able to do more complex tasks, how many instructions should we give it at one go for reliable, robust generation?
This ability for LLM to follow a greater number of instructions will help it greatly in doing more complex tool use / multi-step reasoning / agentic tasks.
By constraining LLMs to include difficult financial keywords in the report generated, this study measures how 20 SOTA LLMs can handle increasing constraints.
Overall, it appears that o3 (reasoning) and gemini-2.5-pro-review (reasoning) are the best at following complex instructions.
On another note, should we aim to increase instruction following complexity, or should we aim to modularise the process into easy bite-sized bits?
~~~
Slides: https://github.com/tanchongmin/john-youtube/blob/main/Discussion_Sessions/LLM_Instructions.pdf
Paper: https://arxiv.org/pdf/2507.11538
Other references:
T5 Paper: https://arxiv.org/pdf/1910.10683
Length and correctness in LLMs (longer response tends to be inaccurate): https://arxiv.org/html/2505.00127v1
My repositories mentioned:
StrictJSON: https://github.com/tanchongmin/strictjson
AgentJo: https://github.com/tanchongmin/agentjo
text-rpg (my attempt at vibe-coding an RPG): https://github.com/tanchongmin/text-rpg
Between Underthinking and Overthinking: An Empirical Study of Reasoning
~~~
0:00 Introduction
5:21 Main Results
16:22 Why is instruction following important?
24:37 Experiment Details
30:21 Report Generation Prompt
42:22 Verbosity of Response vs Accuracy
47:06 Variability of Accuracy across models
1:00:41 Does reasoning help with instruction following?
1:12:22 My guidelines: How to use LLMs in a process / agentic flow
1:28:16 Discussion
1:38:20 Conclusion
~~~
AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.
Discord: https://discord.gg/bzp87AHJy5
LinkedIn: https://www.linkedin.com/in/chong-min-tan-94652288/
Online AI blog: https://delvingintotech.wordpress.com/
Twitter: https://twitter.com/johntanchongmin
Try out my games here: https://simmer.io/@chongmin