Two companies running the same AI workload now pay 2x different bills.

Not because one uses pricier models. Because one runs everything on a single model. Per Augment Code's April 2026 cost model, a 200-call coding session costs $2.02 on Opus 4.6 alone, but $0.98 when the same work is split across 4 roles. 51% of the gap comes from model placement, not model price.

What's wrong with one model for everything?

A single model creates two failure modes at once. Over-provisioning on simple tasks (wasted spend), under-provisioning on complex ones (wasted quality). The same model fails in both directions simultaneouslythat's the core issue.

Concrete cost gap, per Anthropic's April 2026 pricing.

Model Input ($/M tok) Output ($/M tok) Best for
Opus 4.6 $5.00 $25.00 Complex reasoning, architecture decisions
Sonnet 4.6 $3.00 $15.00 General code generation, multi-file work
Haiku 4.5 $1.00 $5.00 File search, simple edits, linting

Opus and Haiku are 5x apart on input, 5x apart on output. But more than half the 200 calls a coding agent makes are pattern-matching tasks — grep, directory listing, import tracing. Routing those through Opus is driving a Ferrari to the corner store.

A DEV Community analysis found 70% of coding agent tokens are waste — too many file reads, redundant searches, verbose tool output. Moving that 70% to Haiku alone is a 5x cost cut.

So what exactly are the "4 roles"?

Through 2026, Anthropic, OpenAI, Augment Code, and CrewAI all converged on the same pattern: 4-role routing. Every coding agent task gets classified into one of four roles, and each role gets a different model.

  1. Coordinator — Opus 4.6
    Decomposes requirements into subtasks and orchestrates downstream agents. The role that demands the deepest reasoning. Mistakes here cascade through every downstream task. SWE-bench Verified 80.84%, top score on MCP Atlas tool-use benchmark.
  2. Implementor — Sonnet 4.6
    Actual code generation, multi-file edits, test writing. 67% cheaper output tokens than Opus per generation. SWE-bench 79.6% — only 1.2 points behind Opus.
  3. Navigator — Haiku 4.5
    File search, grep, symbol resolution, boilerplate. 5x cheaper input and output than Opus. For pattern-matching tasks, the quality gap with Sonnet is barely measurable.
  4. Reviewer — GPT-5.2
    Async code review, security analysis. More tool calls = deeper investigation. The DryRun Security report shows Codex (GPT-5.2) with -1 issues, Claude with +4. Review is a "thoroughness over speed" job.

How big is the actual cost gap?

Augment Code's published 200-call session simulation. Same workload, two routing approaches.

Task type Frequency Single Opus 4-role routing
Architecture planning 1x $0.140 $0.140 (Opus)
Complex implementation 3x $0.780 $0.468 (Sonnet)
Quick edits 8x $0.420 $0.084 (Haiku)
Code review 4x $0.300 $0.060 (Haiku)
Test generation 4x $0.380 $0.228 (Sonnet)
Session total 20 $2.02 $0.98 (-51%)

The biggest cuts are in quick edits and code review. $0.72 → $0.14, accounting for 56% of the total savings. AWS Bedrock's Intelligent Prompt Routing reports up to 30% savings, and Anthropic and OpenAI both stack a 50% batch discount on async work.

Static, Dynamic, or Hybrid — which routing should you pick?

Even after deciding the role split, three approaches coexist for how tasks get routed. Each fits a different situation.

Approach Best for Latency added Setup difficulty
Static (preset rules) Fixed-role pipelines None Low — assign model per agent
Dynamic (RouteLLM, etc.) Variable difficulty within a role 50–200ms/call Medium — train a classifier
Hybrid (OpenAI pattern) Planner picks the executor Planning step only Medium — planner + pool

Under 500 calls a day, Static is the most efficient. Dynamic routing's classifier overhead eats the savings. Claude Code's sub-agents API and CrewAI's LLM instance pattern are both Static, and most solo and small teams start there.

Routing trap to watch — chasing maximum savings by routing everything to Haiku triggers a retry explosion. If more than 20% of Haiku output needs Sonnet/Opus correction, the 5x price gap collapses. For the first week, log per-task error rates, then promote any task type past that threshold.

Just the essentials: how to start

  1. Break down a week of token usage by task type
    From your Claude Code or Cursor logs, classify usage into 5 types (architecture / implementation / edits / review / tests). The biggest token sink is where routing has the highest ROI.
  2. Move the highest-frequency task type to Haiku
    Usually file search, grep, and linting. Run for a week and measure the share of Haiku output you accept as-is. Above 80% — keep it. Below — promote to Sonnet.
  3. Never downgrade the Coordinator slot
    Bad task decomposition makes every downstream agent waste tokens. Opus's 15–19 point MCP Atlas lead over Sonnet is exactly this difference. Drop to Sonnet only for fast prototype loops.
  4. Cap your agent at 25 iterations
    Most token waste comes from agent loops (same attempt repeated), not routing. Aider, Cline, and Claude Code all support max-iterations. If 25 tries can't solve it, 50 won't either.