Context Rot: How Increasing Input Tokens Impacts LLM Performance (Paper Analysis)

Subscribers:
292,000
Published on ● Video Link: https://www.youtube.com/watch?v=hpC4qjWu_aY



Duration: 0:00
9,350 views
409


Paper: https://research.trychroma.com/context-rot

Abstract:
Large Language Models (LLMs) are typically presumed to process context uniformly—that is, the model should handle the 10,000th token just as reliably as the 100th. However, in practice, this assumption does not hold. We observe that model performance varies significantly as input length changes, even on simple tasks.
In this report, we evaluate 18 LLMs, including the state-of-the-art GPT-4.1, Claude 4, Gemini 2.5, and Qwen3 models. Our results reveal that models do not use their context uniformly; instead, their performance grows increasingly unreliable as input length grows.

Authors: Kelly Hong, Anton Troynikov, Jeff Huber

Links:
Homepage: https://ykilcher.com/
Merch:
YouTube:
Twitter: https://twitter.com/ykilcher
Discord: https://ykilcher.com/discord
LinkedIn: https://www.linkedin.com/in/ykilcher

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n




Other Videos By Yannic Kilcher


2 days agoContext Rot: How Increasing Input Tokens Impacts LLM Performance (Paper Analysis)
6 days agoEnergy-Based Transformers are Scalable Learners and Thinkers (Paper Review)
2025-05-03On the Biology of a Large Language Model (Part 2)
2025-04-05On the Biology of a Large Language Model (Part 1)
2025-01-26[GRPO Explained] DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
2024-12-26Traditional Holiday Live Stream
2024-12-24Byte Latent Transformer: Patches Scale Better Than Tokens (Paper Explained)
2024-12-10Safety Alignment Should be Made More Than Just a Few Tokens Deep (Paper Explained)
2024-11-23TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters (Paper Explained)
2024-10-19GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
2024-10-12Were RNNs All We Needed? (Paper Explained)
2024-10-05Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (Paper)
2024-08-04Privacy Backdoors: Stealing Data with Corrupted Pretrained Models (Paper Explained)
2024-07-08Scalable MatMul-free Language Modeling (Paper Explained)
2024-06-26Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools (Paper Explained)
2024-06-01xLSTM: Extended Long Short-Term Memory
2024-05-21[ML News] OpenAI is in hot waters (GPT-4o, Ilya Leaving, Scarlett Johansson legal action)
2024-05-01ORPO: Monolithic Preference Optimization without Reference Model (Paper Explained)
2024-04-30[ML News] Chips, Robots, and Models
2024-04-28TransformerFAM: Feedback attention is working memory
2024-04-27[ML News] Devin exposed | NeurIPS track for high school students