In a market where Big Tech is spending billions with thousands of engineers, a 26-person startup built a 400B-parameter open-source LLM for $20 million. And it became the #1 most-used open model on OpenClaw.
What is this?
Arcee AI is a San Francisco startup that most people haven't heard of yet. It started as a B2B business doing LLM fine-tuning for enterprise clients like SK Telecom — but CEO Mark McQuade decided they couldn't keep relying on other companies' models. So in late 2025, they started building their own from scratch.
The result is the Trinity series. They shipped smaller models first (Nano 6B, Mini 26B) in December 2025, then Trinity Large (400B) in January 2026, and finally Trinity-Large-Thinking — a reasoning-enhanced version — on April 1, 2026. All of this happened in just 9 months on a total budget of $20 million.
The timing is significant. When Anthropic announced that Claude Code subscribers would need to pay extra to use OpenClaw, the community started looking for alternatives. Trinity-Large-Thinking scored 91.9 on PinchBench — the benchmark specifically designed for OpenClaw agent tasks — just behind Claude Opus 4.6's 93.3. At $0.90 per million output tokens, it's 96% cheaper.
What makes it different?
The biggest differentiator is the license. Trinity ships under Apache 2.0 — no strings attached. Meta's Llama has faced criticism for its restrictive commercial conditions, and Chinese models (DeepSeek, Qwen) — while technically impressive — are a no-go for many U.S. and European companies due to data sovereignty concerns.
Trinity fills that gap. Anyone can download the weights, run them on-premises, fine-tune on their own data, and deploy commercially. No restrictions. Hugging Face co-founder Clement Delangue put it well: "The strength of the US has always been its startups. Arcee shows that it's possible!"
| Claude / GPT-4o (Closed) | Trinity-Large-Thinking | Llama 4 (Meta) | |
|---|---|---|---|
| License | API lock-in, proprietary | Apache 2.0 (fully open) | Meta conditional license |
| On-premises | Not possible | Yes (download weights) | Yes (commercial limits apply) |
| PinchBench | 93.3 (Opus 4.6) | 91.9 | N/A |
| Cost (1M output tokens) | $25 (Opus) | $0.90 | Varies by cloud |
| Active parameters | Dense architecture | 13B active / 400B total (MoE) | Maverick MoE |
The architecture is clever. Trinity uses a Mixture-of-Experts (MoE) design with 256 expert models, but only 4 activate per token. Total parameters: 400B. Active at inference: just 13B (1.56%). The result is 2–3x faster inference than comparable models on the same hardware.
Quick start: How to use it
There are three ways to get started with Trinity.
- Try it on OpenRouter (fastest)
Selectarcee-ai/trinity-large-thinkingon openrouter.ai. Already integrated with OpenClaw, Cline, and Kilo Code. - Arcee API (for teams and enterprises)
Sign up at chat.arcee.ai for an API key. $0.90/million output tokens — 96% cheaper than Claude Opus. Running at 128k context with 8-bit quantization. - Download weights directly (on-premises / research)
Three versions on Hugging Face: Preview (lightly fine-tuned instruct), Base (17T token checkpoint), TrueBase (10T tokens, pure pretraining — no instruct data). TrueBase is ideal for regulated industries needing custom alignment from scratch. - Set it as your default model in OpenClaw
Switch to Trinity-Large-Thinking in OpenClaw settings. Works with OpenRouter credits — no Anthropic subscription needed.
Good to know before you switch
Trinity-Large-Thinking is text-only for now. Multimodal support is in development, so you'll need another model for image understanding. It's strong at agent tasks, but scores 63.2 on SWE-bench Verified vs. Claude Opus 4.6's 75.6.




