GitHub Copilot's token billing started on June 1. Microsoft unveiled its own coding model on June 2 — one day later. Could be a coincidence. But once you understand how this model is designed, you might think otherwise.
137 billion parameters — so why is it cheap and fast?
MAI-Code-1-Flash has 137B total parameters, but only 5B activate during inference. That's the Mixture-of-Experts (MoE) architecture.
Think of it like a team of specialists. When a patient comes in, only the relevant doctor or two handles the case — the rest stay on standby. The model works the same way: for each token, only the most relevant 5B parameters activate. The other 132B sit out that computation. You get the breadth of a 137B model at the speed and cost of a 5B one.
Fast, affordable, and smart — the MoE architecture is why. Pricing lands at $0.75 per 1M input tokens and $4.50 per 1M output tokens. And it uses 60% fewer tokens on hard problems on top of that.
Trained on real workflows, not just benchmarks
Most coding models are optimized to score well on SWE-Bench and similar benchmarks. MAI-Code-1-Flash took a different approach. It was trained directly inside GitHub Copilot's production harness — actual file edits, terminal calls, and multi-turn conversations.
And one more thing: no knowledge distillation from OpenAI or any third-party model. Built entirely on Microsoft's own clean, traceable, enterprise-grade data. It's as much a declaration of AI independence as it is a product launch.
| Typical coding model | MAI-Code-1-Flash | |
|---|---|---|
| Training environment | Benchmark optimization | Copilot production harness |
| Data source | Varies (may include distillation) | Self-collected, no third-party distillation |
| SWE-Bench Pro | 35.2% (Claude Haiku 4.5) | 51.2% (+16 points) |
| SWE-Bench Verified | 66.6% (Claude Haiku 4.5) | 71.6% |
| Token efficiency | Baseline | Up to 60% fewer tokens on hard tasks |
On instruction following (IF Bench), it leads Claude Haiku 4.5 by 28.9 points. On an adversarial reasoning benchmark spanning 186 questions across 34 categories, it hit 85.8% adjusted accuracy. Not what you'd expect from a "small" model.
The billing connection
GPT-5.5 runs $5 input / $30 output per 1M tokens. MAI-Code-1-Flash is $0.75 / $4.50 — and uses 60% fewer tokens. The difference in your monthly bill can be substantial.
How to set up MAI-Code-1-Flash in Copilot's model picker
- Update VS Code + Copilot extension
The model picker only shows in the latest version. Update the GitHub Copilot extension from VS Code's Extensions tab. - Select in picker or use Auto
In the Copilot Chat panel, click the dropdown for the model list. Pick MAI-Code-1-Flash directly, or choose Auto to let Copilot route based on task type automatically. - Task-based routing guide
Inline edits, refactors, short bug fixes, repo Q&A, repetitive tasks → MAI-Code-1-Flash. Complex architecture design, deep security reviews, large-scale autonomous implementations → frontier models (MAI-Thinking-1, Claude Opus, etc.). - Business/Enterprise users
General availability for Business and Enterprise plans rolled out June 26, 2026. If it's not in your picker yet, give it a few days or check GitHub Community Discussions. - Monitor usage dashboard
Check the Usage Dashboard in Copilot settings to see per-model token consumption. Verify the token savings in real numbers on your own workflows.
When to reach for a different model
For major architecture decisions, long autonomous implementations, and complex multi-system debugging, MAI-Code-1-Flash may not be the best choice. It's optimized as a fast first responder for everyday coding — escalate to larger models when you need deeper reasoning.
Here's where MAI-Code-1-Flash currently runs:
Want to go deeper?
Introducing MAI-Code-1-Flash Official announcement from Microsoft's Superintelligence team. Full training methodology, MoE architecture, and benchmark breakdown. microsoft.ai
MAI-Code-1-Flash is now available for GitHub Copilot Initial launch changelog with gradual rollout schedule across Copilot tiers and model picker instructions. github.blog
MAI-Code-1-Flash available on more Copilot surfaces Expansion to JetBrains, Eclipse, Xcode, mobile, and CLI — 9 additional platforms. github.blog
MAI-Code-1-Flash for Copilot Business and Enterprise Enterprise rollout announcement and availability timeline. github.blog
Microsoft MAI-Code-1-Flash in GitHub Copilot: Pricing and Performance Pricing structure breakdown and practical use case analysis. smartscope.blog
MAI-Code-1-Flash: Microsoft's Copilot-Native Coding Model Developer-perspective analysis of model routing and real-world use cases. chatforest.com
GitHub Copilot's Token Billing Backlash Hits as Microsoft Build 2026 Opens With MAI Strategic context: the billing change and MAI launch timing analyzed. the-agent-report.com




