Magma: A foundation model for multimodal AI Agents | Microsoft Research Forum

Subscribers:
342,000
Published on ● Video Link: https://www.youtube.com/watch?v=SbfzvUU5yM8



Duration: 0:00
23,385 views
389


Jianwei Yang, Principal Researcher, Microsoft Research Redmond, introduces Magma, a new multimodal agentic foundation model designed for UI navigation in digital environments and robotics manipulation in physical settings. It covers two new techniques, Set-of-Mark and Trace-of-Mark, for action grounding and planning, and details the unified pretraining pipeline that learns agentic capabilities.

Magma on arXiv: https://arxiv.org/pdf/2502.13130
Magma code on GitHub: https://microsoft.github.io/Magma/
Azure AI Foundry: https://ai.azure.com/

This session aired on February 25, 2025, at Microsoft Research Forum, Episode 5.

Register for the series: https://aka.ms/registerresearchforumYTe5

Continue watching episode 5: https://aka.ms/researchforumYTe5
Explore all previous episodes: https://aka.ms/researchforumYTplaylist




Other Videos By Microsoft Research


2025-03-24Differentially Private Synthetic Data without Training
2025-03-21Celebrating Susan Dumais: Reflections on a Legacy of Research and Collaboration | Plenary Session
2025-03-21The Assistant: Situated Interaction Project (2012)
2025-03-20The AI Revolution in Medicine, Revisited: An Introduction
2025-03-10AI and Europe's history of reinvention
2025-03-03World and Human Action Models towards gameplay ideation (Supplementary Video 1)
2025-03-03LLMs vs. Torch 1.5: Why Your Code Assistant Can't Keep Up
2025-02-25Using LLMs for safe low-level programming | Microsoft Research Forum
2025-02-25AutoGen v0.4: Reimagining the foundation of agentic AI for scale and more | Microsoft Research Forum
2025-02-25Belief state transformers | Microsoft Research Forum
2025-02-25Magma: A foundation model for multimodal AI Agents | Microsoft Research Forum
2025-02-25Chimera: Accurate synthesis prediction by ensembling models with... | Microsoft Research Forum
2025-02-25AI for Precision Health: Learning the language of nature and patients | Microsoft Research Forum
2025-02-25Keynote: Multimodal Generative AI for Precision Health | Microsoft Research Forum
2025-02-21WHAM Demonstrator tutorial
2025-02-07Attestations over TLS 1.3 and ZKP
2025-01-02Accelerating Multilingual RAG Systems
2024-12-30Pronouns in the Workplace: Learning Inclusive Software Design from Real-World Experiences
2024-12-20Culturally Aware Machines: Why and when are they useful?
2024-12-18Embodied AI Workshop at CVPR 2024
2024-12-10GASP: Gaussian Avatars with Synthetic Priors