Arcee AI CTO Lucas Atkins와 CEO Mark McQuade

techcrunch.com

26 People vs. OpenAI — How Arcee's Open Source LLM Hit #1 on OpenClaw

Arcee AI, Trinity LLM, open source AI, OpenClaw alternative, small AI startupDev

I cant help rooting for tiny open source AI model maker Arcee — TechCrunch

Tiny startup Arcee AI built a 400B-parameter open source LLM from scratch to best Metas Llama

Trinity-Large-Thinking: Scaling an Open Source Frontier Agent — Arcee AI

In a market where Big Tech is spending billions with thousands of engineers, a 26-person startup built a 400B-parameter open-source LLM for $20 million. And it became the #1 most-used open model on OpenClaw.

3-second summary

26-person startup Arcee → $20M + 33-day training run → 400B open-source Trinity LLM → #1 on OpenClaw → 96% cheaper than Claude

What is this?

Arcee AI is a San Francisco startup that most people haven't heard of yet. It started as a B2B business doing LLM fine-tuning for enterprise clients like SK Telecom — but CEO Mark McQuade decided they couldn't keep relying on other companies' models. So in late 2025, they started building their own from scratch.

The result is the Trinity series. They shipped smaller models first (Nano 6B, Mini 26B) in December 2025, then Trinity Large (400B) in January 2026, and finally Trinity-Large-Thinking — a reasoning-enhanced version — on April 1, 2026. All of this happened in just 9 months on a total budget of $20 million.

The timing is significant. When Anthropic announced that Claude Code subscribers would need to pay extra to use OpenClaw, the community started looking for alternatives. Trinity-Large-Thinking scored 91.9 on PinchBench — the benchmark specifically designed for OpenClaw agent tasks — just behind Claude Opus 4.6's 93.3. At $0.90 per million output tokens, it's 96% cheaper.

Total team size

$20M

Total Trinity development cost

96%

Cheaper than Claude Opus

3.37T

Tokens served in first 2 months

What makes it different?

The biggest differentiator is the license. Trinity ships under Apache 2.0 — no strings attached. Meta's Llama has faced criticism for its restrictive commercial conditions, and Chinese models (DeepSeek, Qwen) — while technically impressive — are a no-go for many U.S. and European companies due to data sovereignty concerns.

Trinity fills that gap. Anyone can download the weights, run them on-premises, fine-tune on their own data, and deploy commercially. No restrictions. Hugging Face co-founder Clement Delangue put it well: "The strength of the US has always been its startups. Arcee shows that it's possible!"

	Claude / GPT-4o (Closed)	Trinity-Large-Thinking	Llama 4 (Meta)
License	API lock-in, proprietary	Apache 2.0 (fully open)	Meta conditional license
On-premises	Not possible	Yes (download weights)	Yes (commercial limits apply)
PinchBench	93.3 (Opus 4.6)	91.9	N/A
Cost (1M output tokens)	$25 (Opus)	$0.90	Varies by cloud
Active parameters	Dense architecture	13B active / 400B total (MoE)	Maverick MoE

The architecture is clever. Trinity uses a Mixture-of-Experts (MoE) design with 256 expert models, but only 4 activate per token. Total parameters: 400B. Active at inference: just 13B (1.56%). The result is 2–3x faster inference than comparable models on the same hardware.

Quick start: How to use it

There are three ways to get started with Trinity.

Try it on OpenRouter (fastest)
Select arcee-ai/trinity-large-thinking on openrouter.ai. Already integrated with OpenClaw, Cline, and Kilo Code.
Arcee API (for teams and enterprises)
Sign up at chat.arcee.ai for an API key. $0.90/million output tokens — 96% cheaper than Claude Opus. Running at 128k context with 8-bit quantization.
Download weights directly (on-premises / research)
Three versions on Hugging Face: Preview (lightly fine-tuned instruct), Base (17T token checkpoint), TrueBase (10T tokens, pure pretraining — no instruct data). TrueBase is ideal for regulated industries needing custom alignment from scratch.
Set it as your default model in OpenClaw
Switch to Trinity-Large-Thinking in OpenClaw settings. Works with OpenRouter credits — no Anthropic subscription needed.

Good to know before you switch

Trinity-Large-Thinking is text-only for now. Multimodal support is in development, so you'll need another model for image understanding. It's strong at agent tasks, but scores 63.2 on SWE-bench Verified vs. Claude Opus 4.6's 75.6.

🔗

Go deeper

Trinity-Large-Thinking Official Blog

CTO Lucas Atkins' first-person account of the release, training process, and design philosophy

VentureBeat Deep Dive

The most technical breakdown: MoE architecture, SMEBU expert balancing, benchmark comparisons

Hugging Face Model Hub

Download Trinity weights — Preview, Base, and TrueBase all under Apache 2.0

OpenClaw x OpenRouter Live Stats

Real-time model usage data — see exactly where Trinity ranks

Trinity Large Tech Report (GitHub)

SMEBU algorithm, z-loss regularization, and full training details in the official technical report

TechCrunch — Why You Cant Help Rooting for Arcee

CEO McQuade interview + the OpenClaw drama — the best narrative piece on why this matters

FAQ

Can I use it commercially even though it's open source?

Yes, Apache 2.0 means no restrictions. You can embed it in a product, add it to a SaaS, or deploy it for clients. Just keep the NOTICE file with attribution.

How much hardware do I need to run it on-premises?

Despite being 400B parameters, MoE means only 13B are active at inference. In practice, 2–4 high-end GPUs (like A100 80GB) can handle it. Storage-wise, you're looking at around 200GB with 8-bit quantization.

Is Trinity-Large-Thinking good enough for coding?

It's strong for agent tasks — scoring 91.9 on OpenClaw's PinchBench. For SWE-bench (code generation benchmark), it scores 63.2 vs. Claude Opus 4.6's 75.6. Better for multi-turn tool use and long-horizon agent loops than raw code generation.

Did Trinity only get popular because Anthropic blocked OpenClaw?

The timing aligned, but Trinity-Large-Preview had already processed 3.37 trillion tokens on OpenRouter in its first two months before the Anthropic policy change. The community interest was already there — Anthropic's move amplified it.

Are smaller models coming too?

Yes. Arcee plans to distill the pretraining and post-training lessons from Trinity Large into refreshed Mini (26B) and Nano (6B) models — a second-generation Trinity-2 series. Vision and speech-to-text models are also on the roadmap.

Written by Rush

Tracking where business meets AI.

Did you find this reference helpful?

Get curated references delivered to your inbox weekly

Share this reference

Antioch — Meet the Cursor for Robot AI

Physical AI startups no longer need to rent warehouses or build million-dollar test facilities. Antioch brings software-speed development to robotics through cloud simulation — and just raised $8.5M seed to prove it.

Explore more AI workflow guides on similar topics

$20K and 12 AI Tools Built a $1.8B Telehealth Company — And Then the Red Flags Arrived

morningbrew.com

Medvi telehealth, AI startup leverage, GLP-1 startup, one-person unicorn, AI operations

$20K and 12 AI Tools Built a $1.8B Telehealth Company — And Then the Red Flags Arrived

Matthew Gallagher built Medvi, a GLP-1 telehealth startup, in 14 months with $20,000 and AI tools. 2 employees. 16.2% net margin. $401M in year one. Here's how the model works — and where it's breaking.

AI That Works While You Sleep — Automating Recurring Tasks with Claude Code Scheduled Task

substackcdn.com

What if your code review was already done when you woke up, and your newsletter

AI That Works While You Sleep — Automating Recurring Tasks with Claude Code Scheduled Task

What if your code review was already done when you woke up, and your newsletter sources were already organized? Here's how to automate recurring tasks with Claude Code Scheduled Task.

Next →Antioch — Meet the Cursor for Robot AI