The AI coding agent ranking just changed. In April 2026, Claude Opus 4.7 hit 64.3% on SWE-bench Pro — beating both GPT-5.4 (57.7%) and Gemini 3.1 Pro (54.2%). And now it's live on Amazon Bedrock. This isn't just "another access channel" — something specifically changed when it arrived on Bedrock.
What's the 64.3%?
There's a benchmark called SWE-bench. It measures how well AI can resolve bugs and feature requests pulled from real GitHub open-source repos. SWE-bench Pro is the hardest version — it uses actual production issues from major open-source projects. It's the most realistic indicator of "how useful is this coding agent in the real world."
Opus 4.7 scored 64.3%. That's up from 53.4% in Opus 4.6 — a 10.9-point improvement. It leads GPT-5.4 (57.7%) by 6.6 points and Gemini 3.1 Pro (54.2%) by over 10 points. If you're building or using a coding agent, this gap is genuinely noticeable in practice.
It's not just coding. On MCP-Atlas — which measures how well an AI handles external tools — Opus 4.7 hit 77.3%, ahead of GPT-5.4 (75.3%) and Gemini (73.9%). That's the key metric for building multi-agent workflows. The one regression: BrowseComp (web research) dropped to 79.3% from 83.7% in 4.6. The team focused on coding and tool use, and made a trade-off on web search.
Vision got a major upgrade too. Max image resolution jumped to 2,576 pixels on the long edge — more than 3x previous models. That matters for UI screenshot analysis, complex diagram parsing, and dense document processing. CharXiv visual reasoning jumped 13 points to 82.1% (from 69.1%).
What actually changed?
The biggest technical change in Opus 4.7 is Adaptive Thinking. Up through Opus 4.6, you had to manually set thinking.type: "enabled" and budget_tokens — telling the model "think for up to 1,000 tokens on this task" or "use 5,000 tokens here." Developer-tuned, every call. In 4.7, that's gone.
With 4.7, it's just thinking.type: "adaptive". The model judges task complexity itself and allocates reasoning tokens automatically. Simple questions get minimal compute; complex refactoring gets deep thinking. No more budget_tokens tuning — it's fully automatic.
| Opus 4.6 | Opus 4.7 | |
|---|---|---|
| Reasoning setup | thinking.type: "enabled" + manual budget_tokens | Just thinking.type: "adaptive" |
| temperature/top_p | Adjustable | Not supported — remove from requests |
| SWE-bench Pro | 53.4% | 64.3% (+10.9pts) |
| Image resolution | Previous level | Up to 2,576px on long edge (3x+) |
| Prompt cache TTL | 5 minutes | 5 min · 1 hour (your choice) |
| Visual reasoning (CharXiv) | 69.1% | 82.1% (+13pts) |
Migration warning from 4.6
Plugging Opus 4.6 code directly into 4.7 will throw a 400 error. You need to change thinking.type to "adaptive" and completely remove temperature, top_p, and top_k parameters. budget_tokens is also gone — Adaptive Thinking handles this automatically.
Pricing is unchanged: $5/M input tokens, $25/M output tokens — same as Opus 4.6. One thing to note: a new tokenizer means the same content may generate 1.0–1.35x more tokens than before. Actual costs may increase slightly, heads up.
The Quick Start: Bedrock in 5 Steps
- Set up AWS account + Bedrock API key
Generate a long-term API key in the Amazon Bedrock console. Set it as theAWS_BEARER_TOKEN_BEDROCKenvironment variable. - Install the SDK
For the Messages API:pip install -U "anthropic[bedrock]". For Converse/Invoke:pip install boto3. Pick one. - Send your first request
Model ID isanthropic.claude-opus-4-7, region defaults tous-east-1. For thinking, use only{"type": "adaptive"}— using enabled or budget_tokens throws a 400. - Optimize costs with prompt caching
Set cache checkpoints for repeated system prompts or documents (min 4,096 tokens). Choose 5-min or 1-hour TTL. Big savings on repeated calls. - Reduce latency with Geo inference
From Asia, usejp.anthropic.claude-opus-4-7(Tokyo/Osaka routing) orglobal.anthropic.claude-opus-4-7for automatic optimal region.
Bedrock's enterprise edge
Bedrock's next-generation inference engine prevents operator access to customer data. If you're already running VPC, IAM, and CloudWatch in AWS, you get enterprise-grade data isolation with no extra security setup.
If You Want to Dig Deeper
Introducing Claude Opus 4.7 — Anthropic The official launch post. Covers Adaptive Thinking design principles, safety evaluations, and cross-platform availability. anthropic.com
Claude Opus 4.7 in Amazon Bedrock — AWS Blog Official Bedrock launch post with Playground walkthrough, API code samples, and regional availability details. aws.amazon.com
Claude Opus 4.7 Benchmarks Explained — Vellum AI Deep-dive on MCP-Atlas, OSWorld, CharXiv, and side-by-side comparisons with GPT-5.4 and Gemini 3.1 Pro. vellum.ai
Amazon Bedrock Model Card — AWS Docs Adaptive Thinking migration guide, prompt caching setup, service tiers, and per-region routing specs in one place. docs.aws.amazon.com
Claude Opus 4.7 vs GPT-5.5 — DataCamp Side-by-side on coding, reasoning, and pricing. Includes areas where GPT-5.5 still leads (Terminal-Bench). datacamp.com




