Boston Dynamics Spot using Gemini Robotics in residential demo

bostondynamics.com

Spot Read a Handwritten To-Do List and Got to Work — Is This the End of Robot Coding?

Spot, Boston Dynamics, Gemini Robotics, AIVI-Learning, 산업 자동화Business

Tools for Your To Do List with Spot and Gemini Robotics

AIVI-Learning Is Now Powered by Google Gemini Robotics

Boston Dynamics' robot dog now reads gauges and thermometers with Google's AI

Spot read a handwritten to-do list off a whiteboard — and just did it.

A video dropped in March showing Boston Dynamics' quadruped robot Spot putting shoes on a rack and clearing empty cans from a living room. The demo itself is interesting enough — but what's actually fascinating is the architecture behind it. The engineers didn't write control code. What they wrote was a set of natural-language prompts — and Gemini Robotics handled the sequencing, failure recovery, and retries.

What Is It?

Two outputs from the Boston Dynamics and Google DeepMind partnership dropped within a month of each other.

(1) "To Do List with Spot" — an experimental demo where Spot performs household tasks using natural-language commands. Gemini Robotics-ER 1.5 orchestrates the work by calling Spot's SDK. (2) AIVI-Learning (commercial product) — Gemini Robotics-ER 1.6 integrated into Spot's industrial visual inspection feature, officially released and auto-activated for all existing AIVI customers on April 8, 2026.

The short version: the way robots get their instructions is shifting from code to natural language — and not just in demos. On actual factory floors.

Gauge recognition: 23% → 98%
That's the jump from Gemini Robotics-ER 1.5 to 1.6 + agentic vision. For reference, Gemini 3.0 Flash alone sits at 67%.
Time writing state machine code → time writing natural-language prompts
Instead of defining each step in code, you describe what each tool does in plain English — and the LLM figures out the sequence.
Operators can now see why the robot made each call (Transparent Reasoning)
AIVI doesn't just hand back results — it exposes its reasoning. That's a critical shift for safety-regulated industries.
Zero-Downtime Upgrades
The model updates automatically in the cloud. Inspection accuracy improves on its own — no firmware flashing, no downtime.

If You're Not Writing Code, How Does It Know What to Do?

Traditional robot programming is all state machines. Move → activate camera → detect object → grip → move → release — every step hand-coded, including branching logic, error handling, and retries. When the environment changes, you rewrite the code.

The Boston Dynamics team did something different. They built a "tool layer" on top of the SDK, and described what each tool does in natural-language prompts. Here's what the actual prompt for the "TakePicture" tool looks like.

"This command takes a photo using the specified camera. Camera selection has some nuance. Immediately after arriving with GoTo, always start with the gripper camera — it's the most information-rich. If the robot is already holding something, either (1) call PutDown immediately or (2) use the front camera to survey the area. Note: the front camera is mounted low, making it unsuitable for photographing objects at height."

That's not a line of code. It's a description of the robot's physical limitations, written in plain language. And it works. On demo day, Spot read a handwritten to-do list off a whiteboard with its camera, then called the appropriate tools item by item until everything was done.

Here's what that shift looks like in practice.

Metric	Before (Code-Based Automation)	Spot × Gemini Robotics
Adding a new task	Write, test, and deploy state machine code	Add a natural-language prompt — demo immediately
Gauge recognition accuracy	Traditional vision model / Gemini Robotics-ER 1.5: 23%	Gemini Robotics-ER 1.6 + agentic vision: 98%
Model updates	Requires firmware update + downtime	Zero-downtime, auto-deployed via cloud
Checking decision rationale	Black box	Transparent Reasoning (reasoning steps exposed)
Handling failures	Humans write all the exception-handling code	Tool responds in plain language ("hands full, can't pick up") → LLM replans

That last row is the point. When a tool returns a plain-language result — "picked up the object," "hands full, can't pick it up" — Gemini Robotics reads it and replans. You don't have to pre-code every edge case. That's the real shift.

What Changes on the Factory Floor?

The home-cleaning demo isn't interesting because it's cool. It's interesting because the exact same architecture was applied to industrial settings (AIVI-Learning).

Spot runs regular inspection rounds in auto plants, power facilities, and distribution centers. Each circuit covers hundreds of assets — analog gauges (pressure, temperature), sight glasses (fluid levels inside tanks), conveyor belt wear, oil leaks, 5S compliance status. Reading these accurately takes more than object recognition — it requires complex visual reasoning.

Inspection Item	Before	After Gemini Robotics-ER 1.6 Integration
Analog gauge / sight glass 0–100% reading	Object detection only — couldn't extract values	Accurate value extraction
5S compliance auditing	Manual inspection	Automated (replaces multi-shift staffing)
Pallet counting	Manual / separate vision system	Handled directly by AIVI
Pooling water / unauthorized personnel detection	Periodic human patrol	Site View alerts (next release)
Model updates	Requires on-site downtime	Cloud auto-update, zero downtime

The 23%→98% gauge-reading jump isn't just a number. That's the threshold where automated inspection actually becomes feasible without a human double-checking every result. At 23%, you still need someone following the robot around. At 98%, you only look at the exceptions.

One more thing worth noting: "agentic vision" is a new concept here. The model doesn't just glance at an image and return an answer — it marks points, crops regions, and zooms back in through a "scratchpad" process, reasoning step by step. It's basically what a human does when they lean in to read a gauge more carefully.

What Should Companies Be Watching Here?

Spot is already deployed at major conglomerates — Hanwha, Hyundai, SK (Hyundai Motor Group owns Boston Dynamics). This isn't a distant scenario. The decision criteria for robot adoption could shift from "cost of coding" to "capability to run prompts".

Step 1: The job of the robot workflow designer changes
The question shifts from "how do we code each step" to "what tools do we build so the LLM can combine them freely." Designing a tool's inputs, outputs, and error messages in natural language becomes the core operational competency.
Step 2: Log the reasoning behind every inspection
With Transparent Reasoning on, every inspection generates a record of the LLM's decision rationale. As industrial safety regulations increasingly require documented justification for automated decisions, this log becomes your primary evidence.
Step 3: Assess your cloud model dependency explicitly
Zero-Downtime Upgrades are convenient — but they also mean the model changes automatically. In industries with strict Management of Change protocols (power plants, semiconductor fabs), your procedures need to account for the fact that today's model may not be the one running tomorrow's inspection.
Step 4: Review the data-sharing terms
Using AIVI-Learning requires sharing facility data with Boston Dynamics (limited to internal BD use). Get legal and security to review whether sensitive industrial data — semiconductor processes, plant schematics — can be part of that data flow.

Getting Started

Step 1: Reframe your existing automation as prompt + tool layer
If you have robot or RPA workflows, break your state machine logic into discrete tools and write down each tool's purpose, constraints, and failure modes in plain language. That exercise alone is the first real step toward LLM-based automation.
Step 2: Set your accuracy threshold at "human-free" levels
Numbers like 98% from Gemini Robotics-ER 1.6 are your decision threshold. Below 80%, someone has to verify every result — the ROI isn't there. Start evaluating deployment for tasks where you can realistically hit 95%+.
Step 3: Use Transparent Reasoning as an operational log
When deploying automated inspection, don't just capture results — store the reasoning too. If something goes wrong, that log is your root-cause analysis and your regulatory defense.
Step 4: Build model auto-update policy into your operating procedures
Zero-downtime is convenient, but in safety-critical industries you need to track which model version performed each inspection. Add model version logging to your operating procedures.
Step 5: Separate shareable data from proprietary data
Not all inspection data should flow into external model training. Keep general safety checks and sensitive proprietary data (processes, schematics) on separate workflows.

Deep Dive Resources

Boston Dynamics — Tools for Your To Do List with Spot and Gemini Robotics BD engineers walk through the internal architecture of the home-cleaning demo — real tool prompt examples, SDK integration approach, and honest coverage of limitations bostondynamics.com

Ars Technica — Robot dogs now read gauges and thermometers using Google Gemini An accessible breakdown of the 23%→98% jump and the agentic vision concept — written for non-specialists, with the safety model implications covered too arstechnica.com

The Robot Report — BD and Google DeepMind use Gemini to make Spot smarter The AIVI-Learning commercial launch in one place — Zero-Downtime Upgrade, Transparent Reasoning, and expanded asset support all in a single read therobotreport.com

FAQ

Is this just a marketing demo, or is there real evidence it works on industrial sites?

The two are actually separate products. The home-cleaning demo was built at a 2025 hackathon — BD explicitly labels it 'experimental.' AIVI-Learning is a different story: it officially activated for all existing AIVI customers on April 8, 2026. The 98% gauge accuracy figures come from industrial deployment contexts, and the product is already being applied to specific tasks like 5S compliance auditing, sight glass measurement, and pallet counting.

Is Gemini Robotics-ER 1.6 available to general developers? Has the API been released?

DeepMind has released Gemini Robotics-ER 1.6 as a 'high-level reasoning model for robots,' and it's accessible to general developers. That said, without infrastructure like Spot's SDK, physical manipulator, and Autowalk, the real challenge isn't the model itself — it's building a tool layer that fits your own robot. For most companies, the more realistic starting point is adding a natural-language prompt interface to existing collaborative robots or logistics AGVs.

A jump from 23% to 98% sounds too good to be true. Where does it break down?

BD is upfront about the limits. First, Gemini Robotics only operates within the tools defined through the SDK — it can't invent new capabilities. Second, hallucinations haven't been eliminated. Ars Technica flagged a case where the 1.5 model incorrectly identified a nonexistent cart as present — 1.6 reduced this, but didn't eliminate it. Third, the 98% gauge figure requires agentic vision enabled. The baseline without it is 86%. You'll need to calibrate human-verification rates to your actual operating environment.

From a regulatory standpoint, how far can you actually trust automated inspection for safety compliance?

Honest answer: not 100%, not yet. That said, Transparent Reasoning is a significant development. When something goes wrong, you can now pull a log showing exactly how the AI reasoned its way to 'looks normal' — which can serve as evidence of due diligence. The catch: since the model auto-updates via cloud, you need to log the model version alongside each inspection result, or post-incident tracing becomes impossible.

Written by Rush

Tracking where business meets AI.

Did you find this reference helpful?

Get curated references delivered to your inbox weekly

Share this reference

Top 20% of Companies Capture 74% of AI's Economic Value — PwC's 1,217-Executive Study Reveals the Real Gap

PwC's 2026 AI Performance Study shows that 74% of AI's economic value is captured by just 20% of companies. Here's what AI leaders do differently and how to close the gap.

Explore more AI workflow guides on similar topics

AI Covers 94% of Tasks but Only 33% Adopt It — Anthropic Measured the Gap

i.redd.it

Anthropic's research shows AI can handle 94% of knowledge work tasks, yet real a

AI Covers 94% of Tasks but Only 33% Adopt It — Anthropic Measured the Gap

Anthropic's research shows AI can handle 94% of knowledge work tasks, yet real adoption sits at 33%. Here's why.

Microsoft Copilot Wave 3 — From Chat Assistant to Agentic Platform

blogs.microsoft.com

Wave 3 transforms Microsoft Copilot from a simple chat helper into a full agenti

Microsoft Copilot Wave 3 — From Chat Assistant to Agentic Platform

Wave 3 transforms Microsoft Copilot from a simple chat helper into a full agentic platform that takes action.

Next →Top 20% of Companies Capture 74% of AI's Economic Value — PwC's 1,217-Executive Study Reveals the Real Gap