LLMs vs. Torch 1.5: Why Your Code Assistant Can't Keep Up

Subscribers:
342,000
Published on ● Video Link: https://www.youtube.com/watch?v=8s1hX6Xw_3Q



Duration: 0:00
1,129 views
27


Speakers: Diganta Misra
Host: Sanchit Ahuja

In the fast-evolving world of software libraries, code generation models are struggling to keep pace. Most existing benchmarks focus on static, version-agnostic code predictions, failing to capture the true complexity of adapting to frequent updates and maintaining compatibility with multiple library versions. To address this gap, we introduce GitChameleon, a novel dataset featuring 116 Python code completion tasks, each tied to specific library versions and accompanied by executable unit tests. This dataset is designed to rigorously evaluate the ability of large language models (LLMs) to generate version-specific code that is both syntactically correct and functionally accurate. Our findings are revealing: even state-of-the-art models like GPT-4o achieve a pass@10 of just 39.9% (43.7% with error feedback), highlighting significant limitations in their ability to adapt to versioned code. In this talk, I’ll explore why today’s LLMs, while impressive, still fall short in the dynamic landscape of evolving software libraries. By examining these challenges, we hope to spark a conversation about how to build more adaptable, reliable code generation tools for the future.




Other Videos By Microsoft Research


2025-04-22Hamming Quasi-Cyclic
2025-04-22Towards Safer Augmented Reality: Identifying, Evaluating, and Mitigating Security & Privacy Threats
2025-04-22Shining light on the learning brain: Estimating mental workload in a simulated flight task using opt
2025-03-24How to Compress Garbled Circuit Input Labels, Efficiently
2025-03-24Differentially Private Synthetic Data without Training
2025-03-21Celebrating Susan Dumais: Reflections on a Legacy of Research and Collaboration | Plenary Session
2025-03-21The Assistant: Situated Interaction Project (2012)
2025-03-20The AI Revolution in Medicine, Revisited: An Introduction
2025-03-10AI and Europe's history of reinvention
2025-03-03World and Human Action Models towards gameplay ideation (Supplementary Video 1)
2025-03-03LLMs vs. Torch 1.5: Why Your Code Assistant Can't Keep Up
2025-02-25Using LLMs for safe low-level programming | Microsoft Research Forum
2025-02-25AutoGen v0.4: Reimagining the foundation of agentic AI for scale and more | Microsoft Research Forum
2025-02-25Belief state transformers | Microsoft Research Forum
2025-02-25Magma: A foundation model for multimodal AI Agents | Microsoft Research Forum
2025-02-25Chimera: Accurate synthesis prediction by ensembling models with... | Microsoft Research Forum
2025-02-25AI for Precision Health: Learning the language of nature and patients | Microsoft Research Forum
2025-02-25Keynote: Multimodal Generative AI for Precision Health | Microsoft Research Forum
2025-02-21WHAM Demonstrator tutorial
2025-02-07Attestations over TLS 1.3 and ZKP
2025-01-02Accelerating Multilingual RAG Systems