Claude Opus 4.7 has reclaimed the top spot on SWE-bench. It beat GPT-5.4 and Gemini 3.1 Pro across coding benchmarks to get back to number one — but once you actually use it, your token budget is going to feel the pain.
What: Anthropic's latest flagship model, Claude Opus 4.7, released April 16, 2026
The gist: SWE-bench Pro score of 64.3% reclaims the coding top spot, vision resolution up 3x, agentic workflow performance improved by 14%
The catch: A new tokenizer converts the same input into up to 1.35x more tokens, and output tokens spike significantly during high-effort reasoning
What Is It?
Opus 4.7, released by Anthropic on April 16, is a direct upgrade to its predecessor, Opus 4.6. Anthropic's core pitch: you can hand it the hardest coding tasks without anyone watching over it.
In practice, the model's self-verification abilities are impressive. In one test, it built a text-to-speech engine from scratch in Rust, then ran the audio it generated through a separate speech recognizer to verify it matched a Python reference implementation — autonomously. That's senior-engineer-level work covering months of effort, done without human supervision.
Key shift: Opus 4.7 follows instructions literally. Where previous models would loosely interpret your prompts, this one executes them precisely — which means prompts that worked fine before may produce unexpected results. Anthropic officially recommends re-tuning your prompts.
Pricing matches Opus 4.6: $5 per million input tokens, $25 per million output tokens. It's available right now on the Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry.
What Changes?
Let's start with the numbers. Opus 4.7 isn't first everywhere, but it holds a clear lead in the areas developers actually care about.
| Benchmark | Opus 4.6 | Opus 4.7 | GPT-5.4 | Gemini 3.1 Pro |
|---|---|---|---|---|
| SWE-bench Verified | 80.8% | 87.6% | - | 80.6% |
| SWE-bench Pro | 53.4% | 64.3% | 57.7% | 54.2% |
| MCP-Atlas (Tool Use) | 75.8% | 77.3% | 68.1% | 73.9% |
| OSWorld (Computer Use) | 72.7% | 78.0% | 75.0% | - |
| GPQA Diamond (Reasoning) | 91.3% | 94.2% | 94.4% | 94.3% |
| BrowseComp (Web Search) | 83.7% | 79.3% | 89.3% | 85.9% |
| GDPVal-AA (Knowledge Work Elo) | - | 1,753 | 1,674 | 1,314 |
Clear number one in coding and tool use, effectively tied in pure reasoning, and actually down 4.4 points in web search (BrowseComp). This isn't a do-everything model — it's purpose-built for coding and agentic work.
Heads Up: Opus 4.7 (79.3%) actually scores lower than 4.6 (83.7%) on BrowseComp. If you're running an agent where web research is central, GPT-5.4 Pro (89.3%) or Gemini 3.1 Pro (85.9%) may be a better fit.
Vision That's 3x Sharper
Image processing resolution is now up to 2,576px on the long side (roughly 3.75 megapixels) — more than triple the previous model's capability. Autonomous security testing firm XBOW confirmed visual accuracy jumped from 54.5% to 98.5%. Screenshot-reading computer use agents, complex technical diagram interpretation, dense UI navigation — things that were previously too blurry to work with are now fair game.
Real Gains in Agentic Workflows
There are changes here that don't reduce to a single number.
The CEO of Cognition (Devin) said: "4.7 works consistently for hours and doesn't give up on hard problems." Factory Droids noted that "a model that used to stop halfway through now sees things to completion," and Replit's CEO described it as "like a colleague who pushes back in technical discussions."
The Token Cost Shadow
Here's the thing — Opus 4.7 genuinely thinks more, and it definitely spends more.
Two reasons token usage is up:
1. New tokenizer — the same input now converts to 1.0–1.35x more tokens.
2. Deeper reasoning — output tokens spike significantly, especially in later turns of agentic sessions.
In Decrypt's real-world testing, a single session burned through the entire token quota. The model would finish all the code, then — labeled as "bug fixes and improvements" — rewrite the whole thing from scratch. Then do it again. This is behavior that never showed up with Opus 4.6.
Anthropic is aware of the issue and has introduced a new effort parameter and task budget to address it.
| Effort Level | Characteristics | Best For |
|---|---|---|
| low/medium | Fast responses, minimal reasoning | Simple queries, data transformation |
| high | Balanced reasoning | General coding, analysis |
| xhigh (new) | Deep reasoning, between high and max | Complex agentic coding (Claude Code default) |
| max | Maximum reasoning, maximum tokens | Only for the hardest problems |
Task budget is in public beta and lets you cap an agent's token usage to prevent surprise bills.
Getting Started
Here's what you need to know when migrating from Opus 4.6 to 4.7.
- Start with prompt re-tuning
4.7 follows instructions literally. Loosely worded "just figure it out" prompts can produce unexpected results. Test with representative traffic before switching over. - Set your effort level
For coding and agentic tasks, start withhighorxhigh. Reservemaxfor your hardest problems. Claude Code defaults to xhigh. - Measure token costs
The new tokenizer means the same input can consume up to 35% more tokens. Measure your cost delta on real traffic before committing to a full rollout. - Use Task Budget
For long-running agents, use the API's task budget (beta) to set a token ceiling and avoid unexpected charges. - Be careful with web search agents
BrowseComp scores dropped, so for research-heavy workflows, it's worth evaluating GPT-5.4 Pro alongside Opus 4.7.
Other Features Shipping Alongside It
There are a few more updates that shipped with Opus 4.7.
Deep Dive Resources
Anthropic Official Announcement The full release document, including benchmarks, safety profile, and migration guide for Opus 4.7. anthropic.com
Vellum Benchmark Analysis Detailed comparisons across key benchmarks — SWE-bench, MCP-Atlas, GPQA Diamond — and recommendations by migration scenario. vellum.ai
Decrypt Hands-On Review Real test results using a game-building prompt — highest quality output yet, but burned through the entire token quota in a single session. decrypt.co
VentureBeat Deep Dive Migration strategy from an enterprise perspective, plus analysis of Anthropic's market positioning. venturebeat.com
TNW Tech Summary A concise tech media rundown of pricing, availability, and key benchmarks. thenextweb.com
Claude Opus 4.7 Migration Guide What to watch for when switching from 4.6 to 4.7, and how to tune effort levels. platform.claude.com




