The more tools you give an AI agent, the better it performs — right? Turns out, it's the complete opposite. When tool access was physically restricted per phase, a 13B model consistently beat unconstrained frontier models.
Why do agents keep making mistakes?
If you've used a coding agent, you know the feeling. When it works, it's incredible — but when it goes sideways, it really goes sideways. Is this just because the model isn't smart enough?
AI researcher Chip Huyen actually analyzed this mathematically. Even if an agent maintains 95% accuracy per step, after 10 steps the overall success rate drops to 60% — and over 100 steps, it plummets to 0.6%. Errors compound geometrically. That's a structural problem, not a model quality problem.
Anthropic acknowledged this directly: "The autonomous nature of agents can lead to higher costs and the potential for compounding errors." The usual response? Use a bigger model. Extend the context window. Statewright goes the opposite direction. Make the problem space smaller.
The core idea is elegant. Based on which phase (State) the agent is currently in, physically restrict which tools it can access. Planning phase: read-only tools only. Implementation: edit tools unlocked. Testing: bash commands only. This isn't a prompt asking the model to "please only use these tools" — it's a protocol-level rejection of unauthorized tool calls.
The core principle: "Agents are suggestions, states are laws"
That's creator Ben Cochran's framing. When a model tries to skip phases or use the wrong tool, the protocol itself rejects the call — not a politely worded warning. It's structural enforcement, not advisory guidance.
What actually changes?
There are already similar-looking tools — LangGraph, XState, Claude Code. Here's how Statewright differs.
| Existing frameworks | Statewright | |
|---|---|---|
| Tool access control | Prompt-based (advisory) | State machine (enforced) |
| On rule violation | Model can ignore it | Protocol-level rejection |
| Model routing | Manual configuration | Automatic per-phase routing |
| Input token efficiency | Full tool list exposed | Current-state tools only |
| Cost reduction potential | — | Up to 80% on multi-phase workflows |
LangGraph connects agents as graph nodes with specialized roles. The philosophy of specialization improving performance is similar — but LangGraph still relies on prompts to guide which tools to use, not physical enforcement. Compared to Claude Code, the difference is even more striking — Claude Code starts with 35,000+ tokens of context overhead. Statewright exposes only the tools relevant to the current state, dramatically reducing input tokens and improving cache efficiency.
And here's the most counterintuitive finding. For models above 13B parameters, structurally constrained smaller models consistently outperformed unconstrained frontier models. The pattern held across Qwen-coder, GPT-OSS, Gemma4, Haiku, Sonnet, and Opus. Around the same time, a project called Forge published independently and reached the same conclusion — two projects converging on identical results is a meaningful signal.
The quick-start guide
- Install — Connect to your editor via MCP
Install the Statewright plugin for Claude Code, Codex, Oh-My-Codex, or other MCP-compatible editors. The core engine and agent crates are Apache 2.0, so there's no cost to get started. - Define your workflow states
Use YAML or JSON to define states and transition conditions. Specify phases like "planning → implementation → testing" and set guard conditions for each phase transition. - Assign tool access per state
Specify which tools are allowed in each state. Planning: file reads only. Implementation: editing tools. Testing: bash execution only. These constraints are enforced at the protocol level — not in the prompt. - Configure per-phase model routing (optional)
To cut costs, route planning to Haiku, implementation to Sonnet, and review to Opus. On multi-phase workflows, this can reduce costs by up to 80%. - Run and check audit logs
Statewright logs every state transition and tool access attempt. Full traceability of what was blocked and when — making it suitable for SOC 2 compliance and enterprise change management workflows.
Go deeper
Show HN: Statewright Discussion Q&A with creator Ben Cochran. Licensing policy, design intent, and direct comparisons to LangGraph and XState — all in one thread. news.ycombinator.com
Building Effective Agents — Anthropic Anthropic's take on the structural causes of agent unreliability and why simplicity should come first. The foundational read for anyone building with agents. anthropic.com
LangGraph: Multi-Agent Workflows The graph-based multi-agent orchestration framework. Worth reading alongside Statewright to understand the two different philosophical approaches to agent control. langchain.com
Agents — Chip Huyen The mathematical breakdown of why agent errors compound. The "95% accuracy → 0.6% success at 100 steps" calculation comes from here. huyenchip.com
XState Documentation The reference for state machines in UI development. Useful context for understanding what Statewright adapted specifically for agentic tool access control. xstate.js.org



