LLMs vs. Torch 1.5: Why Your Code Assistant Can't Keep Up

Channel:

Subscribers:

342,000

Published on March 3, 2025 12:43:50 PM ● Video Link: https://www.youtube.com/watch?v=8s1hX6Xw_3Q

Duration: 0:00

1,129 views

Speakers: Diganta Misra
Host: Sanchit Ahuja

In the fast-evolving world of software libraries, code generation models are struggling to keep pace. Most existing benchmarks focus on static, version-agnostic code predictions, failing to capture the true complexity of adapting to frequent updates and maintaining compatibility with multiple library versions. To address this gap, we introduce GitChameleon, a novel dataset featuring 116 Python code completion tasks, each tied to specific library versions and accompanied by executable unit tests. This dataset is designed to rigorously evaluate the ability of large language models (LLMs) to generate version-specific code that is both syntactically correct and functionally accurate. Our findings are revealing: even state-of-the-art models like GPT-4o achieve a pass@10 of just 39.9% (43.7% with error feedback), highlighting significant limitations in their ability to adapt to versioned code. In this talk, I’ll explore why today’s LLMs, while impressive, still fall short in the dynamic landscape of evolving software libraries. By examining these challenges, we hope to spark a conversation about how to build more adaptable, reliable code generation tools for the future.

Other Videos By Microsoft Research

2025-04-22	Hamming Quasi-Cyclic
2025-04-22	Towards Safer Augmented Reality: Identifying, Evaluating, and Mitigating Security & Privacy Threats
2025-04-22	Shining light on the learning brain: Estimating mental workload in a simulated flight task using opt
2025-03-24	How to Compress Garbled Circuit Input Labels, Efficiently
2025-03-24	Differentially Private Synthetic Data without Training
2025-03-21	Celebrating Susan Dumais: Reflections on a Legacy of Research and Collaboration \| Plenary Session
2025-03-21	The Assistant: Situated Interaction Project (2012)
2025-03-20	The AI Revolution in Medicine, Revisited: An Introduction
2025-03-10	AI and Europe's history of reinvention
2025-03-03	World and Human Action Models towards gameplay ideation (Supplementary Video 1)
2025-03-03	LLMs vs. Torch 1.5: Why Your Code Assistant Can't Keep Up
2025-02-25	Using LLMs for safe low-level programming \| Microsoft Research Forum
2025-02-25	AutoGen v0.4: Reimagining the foundation of agentic AI for scale and more \| Microsoft Research Forum
2025-02-25	Belief state transformers \| Microsoft Research Forum
2025-02-25	Magma: A foundation model for multimodal AI Agents \| Microsoft Research Forum
2025-02-25	Chimera: Accurate synthesis prediction by ensembling models with... \| Microsoft Research Forum
2025-02-25	AI for Precision Health: Learning the language of nature and patients \| Microsoft Research Forum
2025-02-25	Keynote: Multimodal Generative AI for Precision Health \| Microsoft Research Forum
2025-02-21	WHAM Demonstrator tutorial
2025-02-07	Attestations over TLS 1.3 and ZKP
2025-01-02	Accelerating Multilingual RAG Systems

Channel	Latest
Bulkin	10 hours ago
Razor	10 hours ago
playtac	11 hours ago
diaeitsch	11 hours ago
Elgin	11 hours ago
Road to Darkness	11 hours ago
OverTake_gg	11 hours ago
RTV Dukagjini	11 hours ago
GasMaskJoker NedenHolePoker	11 hours ago
R-TAC & Daughters	12 hours ago
FC Rubin Kazan	12 hours ago
Kami Resse	12 hours ago
圍棋愛好者	12 hours ago
TS LAWYER GAMING	12 hours ago
さっくチャンネル	12 hours ago
Khartox	12 hours ago
EYETA	12 hours ago
ГЛАЗАСТАЯ МОРДА	12 hours ago
ivano h	12 hours ago
Papi Corse	12 hours ago
TheCantexGames	12 hours ago
Fenix Channel	12 hours ago
Juegos De MELVIN	12 hours ago
Biriki3	12 hours ago
VoitGG	12 hours ago