The more tools you give an AI agent, the better it performs — right? Turns out, it's the complete opposite. When tool access was physically restricted per phase, a 13B model consistently beat unconstrained frontier models.

30-second summary
Why agents fail Define phases (States) Enforce tool restrictions Small model = frontier performance Up to 80% cost savings

Why do agents keep making mistakes?

If you've used a coding agent, you know the feeling. When it works, it's incredible — but when it goes sideways, it really goes sideways. Is this just because the model isn't smart enough?

AI researcher Chip Huyen actually analyzed this mathematically. Even if an agent maintains 95% accuracy per step, after 10 steps the overall success rate drops to 60% — and over 100 steps, it plummets to 0.6%. Errors compound geometrically. That's a structural problem, not a model quality problem.

Anthropic acknowledged this directly: "The autonomous nature of agents can lead to higher costs and the potential for compounding errors." The usual response? Use a bigger model. Extend the context window. Statewright goes the opposite direction. Make the problem space smaller.

The core idea is elegant. Based on which phase (State) the agent is currently in, physically restrict which tools it can access. Planning phase: read-only tools only. Implementation: edit tools unlocked. Testing: bash commands only. This isn't a prompt asking the model to "please only use these tools" — it's a protocol-level rejection of unauthorized tool calls.

The core principle: "Agents are suggestions, states are laws"

That's creator Ben Cochran's framing. When a model tries to skip phases or use the wrong tool, the protocol itself rejects the call — not a politely worded warning. It's structural enforcement, not advisory guidance.

What actually changes?

There are already similar-looking tools — LangGraph, XState, Claude Code. Here's how Statewright differs.

Existing frameworksStatewright
Tool access controlPrompt-based (advisory)State machine (enforced)
On rule violationModel can ignore itProtocol-level rejection
Model routingManual configurationAutomatic per-phase routing
Input token efficiencyFull tool list exposedCurrent-state tools only
Cost reduction potentialUp to 80% on multi-phase workflows

LangGraph connects agents as graph nodes with specialized roles. The philosophy of specialization improving performance is similar — but LangGraph still relies on prompts to guide which tools to use, not physical enforcement. Compared to Claude Code, the difference is even more striking — Claude Code starts with 35,000+ tokens of context overhead. Statewright exposes only the tools relevant to the current state, dramatically reducing input tokens and improving cache efficiency.

And here's the most counterintuitive finding. For models above 13B parameters, structurally constrained smaller models consistently outperformed unconstrained frontier models. The pattern held across Qwen-coder, GPT-OSS, Gemma4, Haiku, Sonnet, and Opus. Around the same time, a project called Forge published independently and reached the same conclusion — two projects converging on identical results is a meaningful signal.

The quick-start guide

  1. Install — Connect to your editor via MCP
    Install the Statewright plugin for Claude Code, Codex, Oh-My-Codex, or other MCP-compatible editors. The core engine and agent crates are Apache 2.0, so there's no cost to get started.
  2. Define your workflow states
    Use YAML or JSON to define states and transition conditions. Specify phases like "planning → implementation → testing" and set guard conditions for each phase transition.
  3. Assign tool access per state
    Specify which tools are allowed in each state. Planning: file reads only. Implementation: editing tools. Testing: bash execution only. These constraints are enforced at the protocol level — not in the prompt.
  4. Configure per-phase model routing (optional)
    To cut costs, route planning to Haiku, implementation to Sonnet, and review to Opus. On multi-phase workflows, this can reduce costs by up to 80%.
  5. Run and check audit logs
    Statewright logs every state transition and tool access attempt. Full traceability of what was blocked and when — making it suitable for SOC 2 compliance and enterprise change management workflows.
~80%
Cost reduction on multi-phase workflows (with per-phase model routing)
13B+
Parameter threshold for consistent improvement over unconstrained frontier models
Apache 2.0
License for core engine + agent crates (fully open source)

Go deeper

Show HN: Statewright Discussion Q&A with creator Ben Cochran. Licensing policy, design intent, and direct comparisons to LangGraph and XState — all in one thread. news.ycombinator.com

Building Effective Agents — Anthropic Anthropic's take on the structural causes of agent unreliability and why simplicity should come first. The foundational read for anyone building with agents. anthropic.com

LangGraph: Multi-Agent Workflows The graph-based multi-agent orchestration framework. Worth reading alongside Statewright to understand the two different philosophical approaches to agent control. langchain.com

Agents — Chip Huyen The mathematical breakdown of why agent errors compound. The "95% accuracy → 0.6% success at 100 steps" calculation comes from here. huyenchip.com

XState Documentation The reference for state machines in UI development. Useful context for understanding what Statewright adapted specifically for agentic tool access control. xstate.js.org