In a market where Big Tech is spending billions with thousands of engineers, a 26-person startup built a 400B-parameter open-source LLM for $20 million. And it became the #1 most-used open model on OpenClaw.

3-second summary
26-person startup Arcee $20M + 33-day training run 400B open-source Trinity LLM #1 on OpenClaw 96% cheaper than Claude

What is this?

Arcee AI is a San Francisco startup that most people haven't heard of yet. It started as a B2B business doing LLM fine-tuning for enterprise clients like SK Telecom — but CEO Mark McQuade decided they couldn't keep relying on other companies' models. So in late 2025, they started building their own from scratch.

The result is the Trinity series. They shipped smaller models first (Nano 6B, Mini 26B) in December 2025, then Trinity Large (400B) in January 2026, and finally Trinity-Large-Thinking — a reasoning-enhanced version — on April 1, 2026. All of this happened in just 9 months on a total budget of $20 million.

The timing is significant. When Anthropic announced that Claude Code subscribers would need to pay extra to use OpenClaw, the community started looking for alternatives. Trinity-Large-Thinking scored 91.9 on PinchBench — the benchmark specifically designed for OpenClaw agent tasks — just behind Claude Opus 4.6's 93.3. At $0.90 per million output tokens, it's 96% cheaper.

26
Total team size
$20M
Total Trinity development cost
96%
Cheaper than Claude Opus
3.37T
Tokens served in first 2 months

What makes it different?

The biggest differentiator is the license. Trinity ships under Apache 2.0 — no strings attached. Meta's Llama has faced criticism for its restrictive commercial conditions, and Chinese models (DeepSeek, Qwen) — while technically impressive — are a no-go for many U.S. and European companies due to data sovereignty concerns.

Trinity fills that gap. Anyone can download the weights, run them on-premises, fine-tune on their own data, and deploy commercially. No restrictions. Hugging Face co-founder Clement Delangue put it well: "The strength of the US has always been its startups. Arcee shows that it's possible!"

Claude / GPT-4o (Closed) Trinity-Large-Thinking Llama 4 (Meta)
License API lock-in, proprietary Apache 2.0 (fully open) Meta conditional license
On-premises Not possible Yes (download weights) Yes (commercial limits apply)
PinchBench 93.3 (Opus 4.6) 91.9 N/A
Cost (1M output tokens) $25 (Opus) $0.90 Varies by cloud
Active parameters Dense architecture 13B active / 400B total (MoE) Maverick MoE

The architecture is clever. Trinity uses a Mixture-of-Experts (MoE) design with 256 expert models, but only 4 activate per token. Total parameters: 400B. Active at inference: just 13B (1.56%). The result is 2–3x faster inference than comparable models on the same hardware.

Quick start: How to use it

There are three ways to get started with Trinity.

  1. Try it on OpenRouter (fastest)
    Select arcee-ai/trinity-large-thinking on openrouter.ai. Already integrated with OpenClaw, Cline, and Kilo Code.
  2. Arcee API (for teams and enterprises)
    Sign up at chat.arcee.ai for an API key. $0.90/million output tokens — 96% cheaper than Claude Opus. Running at 128k context with 8-bit quantization.
  3. Download weights directly (on-premises / research)
    Three versions on Hugging Face: Preview (lightly fine-tuned instruct), Base (17T token checkpoint), TrueBase (10T tokens, pure pretraining — no instruct data). TrueBase is ideal for regulated industries needing custom alignment from scratch.
  4. Set it as your default model in OpenClaw
    Switch to Trinity-Large-Thinking in OpenClaw settings. Works with OpenRouter credits — no Anthropic subscription needed.

Good to know before you switch

Trinity-Large-Thinking is text-only for now. Multimodal support is in development, so you'll need another model for image understanding. It's strong at agent tasks, but scores 63.2 on SWE-bench Verified vs. Claude Opus 4.6's 75.6.