GitHub Copilot's token billing started on June 1. Microsoft unveiled its own coding model on June 2 — one day later. Could be a coincidence. But once you understand how this model is designed, you might think otherwise.

3-Second Summary
137B/5B MoE Trained in production +16pts vs Claude Haiku 60% fewer tokens New daily coding default

137 billion parameters — so why is it cheap and fast?

MAI-Code-1-Flash has 137B total parameters, but only 5B activate during inference. That's the Mixture-of-Experts (MoE) architecture.

Think of it like a team of specialists. When a patient comes in, only the relevant doctor or two handles the case — the rest stay on standby. The model works the same way: for each token, only the most relevant 5B parameters activate. The other 132B sit out that computation. You get the breadth of a 137B model at the speed and cost of a 5B one.

Fast, affordable, and smart — the MoE architecture is why. Pricing lands at $0.75 per 1M input tokens and $4.50 per 1M output tokens. And it uses 60% fewer tokens on hard problems on top of that.

137B
Total parameters
5B
Active at inference
256K
Context window (tokens)

Trained on real workflows, not just benchmarks

Most coding models are optimized to score well on SWE-Bench and similar benchmarks. MAI-Code-1-Flash took a different approach. It was trained directly inside GitHub Copilot's production harness — actual file edits, terminal calls, and multi-turn conversations.

And one more thing: no knowledge distillation from OpenAI or any third-party model. Built entirely on Microsoft's own clean, traceable, enterprise-grade data. It's as much a declaration of AI independence as it is a product launch.

Typical coding modelMAI-Code-1-Flash
Training environmentBenchmark optimizationCopilot production harness
Data sourceVaries (may include distillation)Self-collected, no third-party distillation
SWE-Bench Pro35.2% (Claude Haiku 4.5)51.2% (+16 points)
SWE-Bench Verified66.6% (Claude Haiku 4.5)71.6%
Token efficiencyBaselineUp to 60% fewer tokens on hard tasks

On instruction following (IF Bench), it leads Claude Haiku 4.5 by 28.9 points. On an adversarial reasoning benchmark spanning 186 questions across 34 categories, it hit 85.8% adjusted accuracy. Not what you'd expect from a "small" model.

The billing connection

GPT-5.5 runs $5 input / $30 output per 1M tokens. MAI-Code-1-Flash is $0.75 / $4.50 — and uses 60% fewer tokens. The difference in your monthly bill can be substantial.

How to set up MAI-Code-1-Flash in Copilot's model picker

  1. Update VS Code + Copilot extension
    The model picker only shows in the latest version. Update the GitHub Copilot extension from VS Code's Extensions tab.
  2. Select in picker or use Auto
    In the Copilot Chat panel, click the dropdown for the model list. Pick MAI-Code-1-Flash directly, or choose Auto to let Copilot route based on task type automatically.
  3. Task-based routing guide
    Inline edits, refactors, short bug fixes, repo Q&A, repetitive tasks → MAI-Code-1-Flash. Complex architecture design, deep security reviews, large-scale autonomous implementations → frontier models (MAI-Thinking-1, Claude Opus, etc.).
  4. Business/Enterprise users
    General availability for Business and Enterprise plans rolled out June 26, 2026. If it's not in your picker yet, give it a few days or check GitHub Community Discussions.
  5. Monitor usage dashboard
    Check the Usage Dashboard in Copilot settings to see per-model token consumption. Verify the token savings in real numbers on your own workflows.

When to reach for a different model

For major architecture decisions, long autonomous implementations, and complex multi-system debugging, MAI-Code-1-Flash may not be the best choice. It's optimized as a fast first responder for everyday coding — escalate to larger models when you need deeper reasoning.

Here's where MAI-Code-1-Flash currently runs:

1/3

IDEs

VS Code, Visual Studio, JetBrains IDEs, Eclipse, Xcode

2/3

GitHub Services

Copilot Chat on GitHub, GitHub Mobile, Copilot cloud agent

3/3

CLI

Copilot CLI (use it directly in your terminal)

Want to go deeper?

Introducing MAI-Code-1-Flash Official announcement from Microsoft's Superintelligence team. Full training methodology, MoE architecture, and benchmark breakdown. microsoft.ai

MAI-Code-1-Flash is now available for GitHub Copilot Initial launch changelog with gradual rollout schedule across Copilot tiers and model picker instructions. github.blog

MAI-Code-1-Flash available on more Copilot surfaces Expansion to JetBrains, Eclipse, Xcode, mobile, and CLI — 9 additional platforms. github.blog

MAI-Code-1-Flash for Copilot Business and Enterprise Enterprise rollout announcement and availability timeline. github.blog

Microsoft MAI-Code-1-Flash in GitHub Copilot: Pricing and Performance Pricing structure breakdown and practical use case analysis. smartscope.blog

MAI-Code-1-Flash: Microsoft's Copilot-Native Coding Model Developer-perspective analysis of model routing and real-world use cases. chatforest.com

GitHub Copilot's Token Billing Backlash Hits as Microsoft Build 2026 Opens With MAI Strategic context: the billing change and MAI launch timing analyzed. the-agent-report.com