Research · Apr 18, 2026

DeepMind releases Gemini Robotics-ER 1.6 with enhanced spatial reasoning for autonomous tasks

The upgraded model improves embodied reasoning capabilities including pointing, counting, and a new instrument-reading feature developed in collaboration with Boston Dynamics.

Trust59

HypeSome hype

1 source

ShareX LinkedIn Email

TL;DR

DeepMind introduced Gemini Robotics-ER 1.6, an upgraded reasoning-first model designed to help robots understand and interact with physical environments.
The model demonstrates enhanced capabilities in spatial reasoning, multi-view understanding, and a newly added instrument-reading function for reading gauges and sight glasses.
The model is available to developers immediately via the Gemini API and Google AI Studio, with benchmark comparisons showing improvements over previous versions.
Key capabilities include pointing for spatial reasoning, counting, success detection, and task planning through integration with tools like Google Search and vision-language-action models.

DeepMind has released Gemini Robotics-ER 1.6, a specialized reasoning model designed to enable robots to understand and navigate physical environments with greater precision. The model builds on embodied reasoning capabilities—the ability to connect digital intelligence with real-world action—by focusing on spatial understanding and multi-view perception, addressing a core challenge in autonomous robotics.

The model's core capabilities center on spatial reasoning tasks essential for robotic systems. Pointing functionality serves as a foundation for multiple reasoning patterns: robots can use points to detect and count objects, establish spatial relationships, map movement trajectories, and apply logical constraints to determine which items meet specific criteria. In demonstration tests, the system accurately counted tool categories while correctly avoiding false positives on items not present in imagery.

A newly developed instrument-reading capability represents an extension of the model's sensing abilities. Developed through partnership with Boston Dynamics, this feature enables robots to interpret complex analog gauges and sight glasses—a practical requirement for industrial and facility monitoring tasks where manual reading becomes a bottleneck.

The model functions as a high-level reasoning layer that can coordinate with other systems. It natively supports tool calling, including integration with Google Search for information retrieval and connection to vision-language-action models—specialized systems that map visual inputs to motor commands. This architecture allows it to orchestrate multi-step tasks requiring both reasoning and direct physical control.

Benchmark comparisons show measurable improvements against both the prior Robotics-ER version and the general-purpose Gemini 3.0 Flash model, particularly in spatial and physical reasoning dimensions. The model is now available through the Gemini API and Google AI Studio, with developer documentation and example code provided to facilitate implementation.

Sources

01Google DeepMind — Blog — Gemini Robotics-ER 1.6: Powering real-world robotics tasks through enhanced embodied reasoning

Also on Research

DeepMind releases Gemini Robotics-ER 1.6 with enhanced spatial reasoning for autonomous tasks

Anthropic reports discovery of an internal reasoning space in its Claude models

Apple researchers propose interactive proof systems to verify distribution property claims with sublinear overhead

Apple researchers propose doubly sub-linear interactive proofs for verifying large inputs