Two companies running the same AI workload now pay 2x different bills.
Not because one uses pricier models. Because one runs everything on a single model. Per Augment Code's April 2026 cost model, a 200-call coding session costs $2.02 on Opus 4.6 alone, but $0.98 when the same work is split across 4 roles. 51% of the gap comes from model placement, not model price.
What's wrong with one model for everything?
A single model creates two failure modes at once. Over-provisioning on simple tasks (wasted spend), under-provisioning on complex ones (wasted quality). The same model fails in both directions simultaneously — that's the core issue.
Concrete cost gap, per Anthropic's April 2026 pricing.
| Model | Input ($/M tok) | Output ($/M tok) | Best for |
|---|---|---|---|
| Opus 4.6 | $5.00 | $25.00 | Complex reasoning, architecture decisions |
| Sonnet 4.6 | $3.00 | $15.00 | General code generation, multi-file work |
| Haiku 4.5 | $1.00 | $5.00 | File search, simple edits, linting |
Opus and Haiku are 5x apart on input, 5x apart on output. But more than half the 200 calls a coding agent makes are pattern-matching tasks — grep, directory listing, import tracing. Routing those through Opus is driving a Ferrari to the corner store.
A DEV Community analysis found 70% of coding agent tokens are waste — too many file reads, redundant searches, verbose tool output. Moving that 70% to Haiku alone is a 5x cost cut.
So what exactly are the "4 roles"?
Through 2026, Anthropic, OpenAI, Augment Code, and CrewAI all converged on the same pattern: 4-role routing. Every coding agent task gets classified into one of four roles, and each role gets a different model.
- Coordinator — Opus 4.6
Decomposes requirements into subtasks and orchestrates downstream agents. The role that demands the deepest reasoning. Mistakes here cascade through every downstream task. SWE-bench Verified 80.84%, top score on MCP Atlas tool-use benchmark. - Implementor — Sonnet 4.6
Actual code generation, multi-file edits, test writing. 67% cheaper output tokens than Opus per generation. SWE-bench 79.6% — only 1.2 points behind Opus. - Navigator — Haiku 4.5
File search, grep, symbol resolution, boilerplate. 5x cheaper input and output than Opus. For pattern-matching tasks, the quality gap with Sonnet is barely measurable. - Reviewer — GPT-5.2
Async code review, security analysis. More tool calls = deeper investigation. The DryRun Security report shows Codex (GPT-5.2) with -1 issues, Claude with +4. Review is a "thoroughness over speed" job.
How big is the actual cost gap?
Augment Code's published 200-call session simulation. Same workload, two routing approaches.
| Task type | Frequency | Single Opus | 4-role routing |
|---|---|---|---|
| Architecture planning | 1x | $0.140 | $0.140 (Opus) |
| Complex implementation | 3x | $0.780 | $0.468 (Sonnet) |
| Quick edits | 8x | $0.420 | $0.084 (Haiku) |
| Code review | 4x | $0.300 | $0.060 (Haiku) |
| Test generation | 4x | $0.380 | $0.228 (Sonnet) |
| Session total | 20 | $2.02 | $0.98 (-51%) |
The biggest cuts are in quick edits and code review. $0.72 → $0.14, accounting for 56% of the total savings. AWS Bedrock's Intelligent Prompt Routing reports up to 30% savings, and Anthropic and OpenAI both stack a 50% batch discount on async work.
Static, Dynamic, or Hybrid — which routing should you pick?
Even after deciding the role split, three approaches coexist for how tasks get routed. Each fits a different situation.
| Approach | Best for | Latency added | Setup difficulty |
|---|---|---|---|
| Static (preset rules) | Fixed-role pipelines | None | Low — assign model per agent |
| Dynamic (RouteLLM, etc.) | Variable difficulty within a role | 50–200ms/call | Medium — train a classifier |
| Hybrid (OpenAI pattern) | Planner picks the executor | Planning step only | Medium — planner + pool |
Under 500 calls a day, Static is the most efficient. Dynamic routing's classifier overhead eats the savings. Claude Code's sub-agents API and CrewAI's LLM instance pattern are both Static, and most solo and small teams start there.
Just the essentials: how to start
- Break down a week of token usage by task type
From your Claude Code or Cursor logs, classify usage into 5 types (architecture / implementation / edits / review / tests). The biggest token sink is where routing has the highest ROI. - Move the highest-frequency task type to Haiku
Usually file search, grep, and linting. Run for a week and measure the share of Haiku output you accept as-is. Above 80% — keep it. Below — promote to Sonnet. - Never downgrade the Coordinator slot
Bad task decomposition makes every downstream agent waste tokens. Opus's 15–19 point MCP Atlas lead over Sonnet is exactly this difference. Drop to Sonnet only for fast prototype loops. - Cap your agent at 25 iterations
Most token waste comes from agent loops (same attempt repeated), not routing. Aider, Cline, and Claude Code all support max-iterations. If 25 tries can't solve it, 50 won't either.




