Gemini Robotics 1.5: Enabling robots to plan, think and use tools to solve complex tasks
We’re powering an era of physical agents with Gemini Robotics 1.5 — enabling robots to perceive, plan, think, use tools and act to better solve complex, multi-step tasks.
🤖 Gemini Robotics 1.5 is our most capable vision-language-action (VLA) model that turns visual information and instructions into motor commands for a robot to perform a task. This model thinks before taking action and shows its process, helping robots assess and complete complex tasks more transparently. It also learns across embodiments, accelerating skill learning.
🤖 Gemini Robotics-ER 1.5 is our most capable vision-language model (VLM) that reasons about the physical world, natively calls digital tools and creates detailed, multi-step plans to complete a mission. This model now achieves state-of-the-art performance across spatial understanding benchmarks.
We’re making Gemini Robotics-ER 1.5 available to developers via the Gemini API in Google AI Studio and Gemini Robotics 1.5 to select partners.
Learn more: https://deepmind.google/models/gemini-robotics/
------
Subscribe to our channel / @googledeepmind
Find us on X https://twitter.com/GoogleDeepMind
Follow us on Instagram https://instagram.com/googledeepmind
Add us on Linkedin https://www.linkedin.com/company/deepmind/