Spot read a handwritten to-do list off a whiteboard — and just did it.

A video dropped in March showing Boston Dynamics' quadruped robot Spot putting shoes on a rack and clearing empty cans from a living room. The demo itself is interesting enough — but what's actually fascinating is the architecture behind it. The engineers didn't write control code. What they wrote was a set of natural-language prompts — and Gemini Robotics handled the sequencing, failure recovery, and retries.

What Is It?

Two outputs from the Boston Dynamics and Google DeepMind partnership dropped within a month of each other.

(1) "To Do List with Spot" — an experimental demo where Spot performs household tasks using natural-language commands. Gemini Robotics-ER 1.5 orchestrates the work by calling Spot's SDK. (2) AIVI-Learning (commercial product) — Gemini Robotics-ER 1.6 integrated into Spot's industrial visual inspection feature, officially released and auto-activated for all existing AIVI customers on April 8, 2026.

The short version: the way robots get their instructions is shifting from code to natural language — and not just in demos. On actual factory floors.

  • Gauge recognition: 23% → 98%
    That's the jump from Gemini Robotics-ER 1.5 to 1.6 + agentic vision. For reference, Gemini 3.0 Flash alone sits at 67%.
  • Time writing state machine code → time writing natural-language prompts
    Instead of defining each step in code, you describe what each tool does in plain English — and the LLM figures out the sequence.
  • Operators can now see why the robot made each call (Transparent Reasoning)
    AIVI doesn't just hand back results — it exposes its reasoning. That's a critical shift for safety-regulated industries.
  • Zero-Downtime Upgrades
    The model updates automatically in the cloud. Inspection accuracy improves on its own — no firmware flashing, no downtime.

If You're Not Writing Code, How Does It Know What to Do?

Traditional robot programming is all state machines. Move → activate camera → detect object → grip → move → release — every step hand-coded, including branching logic, error handling, and retries. When the environment changes, you rewrite the code.

The Boston Dynamics team did something different. They built a "tool layer" on top of the SDK, and described what each tool does in natural-language prompts. Here's what the actual prompt for the "TakePicture" tool looks like.

"This command takes a photo using the specified camera. Camera selection has some nuance. Immediately after arriving with GoTo, always start with the gripper camera — it's the most information-rich. If the robot is already holding something, either (1) call PutDown immediately or (2) use the front camera to survey the area. Note: the front camera is mounted low, making it unsuitable for photographing objects at height."

That's not a line of code. It's a description of the robot's physical limitations, written in plain language. And it works. On demo day, Spot read a handwritten to-do list off a whiteboard with its camera, then called the appropriate tools item by item until everything was done.

Here's what that shift looks like in practice.

Metric Before (Code-Based Automation) Spot × Gemini Robotics
Adding a new task Write, test, and deploy state machine code Add a natural-language prompt — demo immediately
Gauge recognition accuracy Traditional vision model / Gemini Robotics-ER 1.5: 23% Gemini Robotics-ER 1.6 + agentic vision: 98%
Model updates Requires firmware update + downtime Zero-downtime, auto-deployed via cloud
Checking decision rationale Black box Transparent Reasoning (reasoning steps exposed)
Handling failures Humans write all the exception-handling code Tool responds in plain language ("hands full, can't pick up") → LLM replans

That last row is the point. When a tool returns a plain-language result — "picked up the object," "hands full, can't pick it up" — Gemini Robotics reads it and replans. You don't have to pre-code every edge case. That's the real shift.

What Changes on the Factory Floor?

The home-cleaning demo isn't interesting because it's cool. It's interesting because the exact same architecture was applied to industrial settings (AIVI-Learning).

Spot runs regular inspection rounds in auto plants, power facilities, and distribution centers. Each circuit covers hundreds of assets — analog gauges (pressure, temperature), sight glasses (fluid levels inside tanks), conveyor belt wear, oil leaks, 5S compliance status. Reading these accurately takes more than object recognition — it requires complex visual reasoning.

Inspection Item Before After Gemini Robotics-ER 1.6 Integration
Analog gauge / sight glass 0–100% reading Object detection only — couldn't extract values Accurate value extraction
5S compliance auditing Manual inspection Automated (replaces multi-shift staffing)
Pallet counting Manual / separate vision system Handled directly by AIVI
Pooling water / unauthorized personnel detection Periodic human patrol Site View alerts (next release)
Model updates Requires on-site downtime Cloud auto-update, zero downtime

The 23%→98% gauge-reading jump isn't just a number. That's the threshold where automated inspection actually becomes feasible without a human double-checking every result. At 23%, you still need someone following the robot around. At 98%, you only look at the exceptions.

One more thing worth noting: "agentic vision" is a new concept here. The model doesn't just glance at an image and return an answer — it marks points, crops regions, and zooms back in through a "scratchpad" process, reasoning step by step. It's basically what a human does when they lean in to read a gauge more carefully.

What Should Companies Be Watching Here?

Spot is already deployed at major conglomerates — Hanwha, Hyundai, SK (Hyundai Motor Group owns Boston Dynamics). This isn't a distant scenario. The decision criteria for robot adoption could shift from "cost of coding" to "capability to run prompts".

  1. Step 1: The job of the robot workflow designer changes
    The question shifts from "how do we code each step" to "what tools do we build so the LLM can combine them freely." Designing a tool's inputs, outputs, and error messages in natural language becomes the core operational competency.
  2. Step 2: Log the reasoning behind every inspection
    With Transparent Reasoning on, every inspection generates a record of the LLM's decision rationale. As industrial safety regulations increasingly require documented justification for automated decisions, this log becomes your primary evidence.
  3. Step 3: Assess your cloud model dependency explicitly
    Zero-Downtime Upgrades are convenient — but they also mean the model changes automatically. In industries with strict Management of Change protocols (power plants, semiconductor fabs), your procedures need to account for the fact that today's model may not be the one running tomorrow's inspection.
  4. Step 4: Review the data-sharing terms
    Using AIVI-Learning requires sharing facility data with Boston Dynamics (limited to internal BD use). Get legal and security to review whether sensitive industrial data — semiconductor processes, plant schematics — can be part of that data flow.

Getting Started

  1. Step 1: Reframe your existing automation as prompt + tool layer
    If you have robot or RPA workflows, break your state machine logic into discrete tools and write down each tool's purpose, constraints, and failure modes in plain language. That exercise alone is the first real step toward LLM-based automation.
  2. Step 2: Set your accuracy threshold at "human-free" levels
    Numbers like 98% from Gemini Robotics-ER 1.6 are your decision threshold. Below 80%, someone has to verify every result — the ROI isn't there. Start evaluating deployment for tasks where you can realistically hit 95%+.
  3. Step 3: Use Transparent Reasoning as an operational log
    When deploying automated inspection, don't just capture results — store the reasoning too. If something goes wrong, that log is your root-cause analysis and your regulatory defense.
  4. Step 4: Build model auto-update policy into your operating procedures
    Zero-downtime is convenient, but in safety-critical industries you need to track which model version performed each inspection. Add model version logging to your operating procedures.
  5. Step 5: Separate shareable data from proprietary data
    Not all inspection data should flow into external model training. Keep general safety checks and sensitive proprietary data (processes, schematics) on separate workflows.

Deep Dive Resources

Boston Dynamics — Tools for Your To Do List with Spot and Gemini Robotics BD engineers walk through the internal architecture of the home-cleaning demo — real tool prompt examples, SDK integration approach, and honest coverage of limitations bostondynamics.com

Ars Technica — Robot dogs now read gauges and thermometers using Google Gemini An accessible breakdown of the 23%→98% jump and the agentic vision concept — written for non-specialists, with the safety model implications covered too arstechnica.com

The Robot Report — BD and Google DeepMind use Gemini to make Spot smarter The AIVI-Learning commercial launch in one place — Zero-Downtime Upgrade, Transparent Reasoning, and expanded asset support all in a single read therobotreport.com