datocms-assets.com

Open Source AI Is Now Cheaper Than GPT — Why Baseten Hit $13B in 5 Months

AI inference costs, Baseten, open source AI models, inference layer, AI infrastructure investmentBusiness

AI inference startup Baseten reportedly raising $1.5B months after its last mega-round

Baseten $1.5B Raise and the AI Inference Gold Rush

하이퍼스케일러 투자 경쟁 본격화 — AI 인프라 투자 1조 달러 시대

Valuation jumped from $5B to $13B in just five months. Not a company that builds GPT or Claude. A startup that sells the infrastructure to run those models.

Baseten's explosion isn't a coincidence. There's a shift happening in AI cost structures that most teams don't know about yet.

30-Second Summary

Direct OpenAI API → Inference costs spike → Open source matures → Inference layer emerges → Up to 30% savings

The better the models get, the more delivery costs become the battlefield

AI investment has long been concentrated in model development. Billions poured into OpenAI, Anthropic, and xAI. But there's a fact that rarely gets attention.

80-90% of total AI costs come from inference

Model training costs account for only 10-20% of total AI operational costs. The remaining 80-90% comes from actually running the models — inference. Costs are incurred every time a user sends a query.

In 2023, inference accounted for 1/3 of all AI computing. By 2026, it has crossed 2/3. AI is now being deployed at scale in real-world services. And there's good news: AI inference costs have plummeted from $20 per million tokens in 2023 to just $0.07. The problem is that most teams are still wired directly to OpenAI or Anthropic, paying more than they need to.

Open-source model quality has improved dramatically. Models like Llama 3.3, Mistral, and Qwen now approach GPT-4 level on many benchmarks, meaning routing to open source can cut costs by up to 30-50%. But implementing this yourself requires 20 clouds, dozens of models, and auto-routing logic — serious engineering overhead.

That's exactly the gap Baseten stepped into.

3x revenue in 5 months — what does Baseten actually do?

Baseten doesn't own GPUs. Instead, it connects 87 global clusters across 18 cloud providers and routes inference requests to the most cost-efficient option. Think of it as an "AI infrastructure orchestrator."

$200M→$600M

ARR, tripled in a single quarter

1,900%

Year-over-year revenue growth

1B+

Daily inference requests processed

Customers include Cursor (AI coding editor), Notion, and Mercor. Among them, OpenEvidence delivers real-time AI-powered medical information to hundreds of thousands of physicians worldwide — and switching to Baseten produced concrete results.

78%

Latency reduction (700ms → 160ms)

Faster deployment

8x+

Reduction in infrastructure overhead

"With Baseten, everything just works. The complexity we used to pour into underlying infrastructure is gone."
— Zachary Ziegler, Co-founder & CTO, OpenEvidence

It's not just cost savings — it absorbs engineering burden too. Baseten's revenue grew 1,900% year-over-year and inference volume grew 40x in 2025.

	Direct closed API	Via inference layer
Model choice	Locked to one provider	20+ clouds, open source included
Token costs	Fixed pricing	Up to 50%+ savings possible
Latency optimization	Provider-controlled	Automatic multi-cloud routing
Deployment speed	Hours to days	Under 1 hour (OpenEvidence case)
Vendor lock-in	High	Low

How to audit where your team's AI costs are leaking

Even if you don't plan to use Baseten, what this market is telling us is clear. If you're running AI in production, you need to audit your inference cost structure right now.

Review your AI token costs
Pull your last 3 months of OpenAI/Anthropic invoices and identify which models are costing what. Most teams find 70-80% of costs are concentrated in just 2-3 types of API calls.
Classify tasks by model tier
Not every task needs GPT-4 or Claude Opus. Simple classification, summarization, and embedding can often be handled by smaller open-source models. Map out the actual performance threshold each task type requires.
Test open-source alternatives
Together AI, Modal Labs, and Baseten all offer free testing environments. Run your current API tasks on open-source models like Llama 3.3 or Mistral and compare the results.
Calculate the cost-quality tradeoff
If quality is comparable, figure out how much you'd save annually. If monthly AI costs exceed $500, the ROI on an inference layer starts to make sense.
Start a gradual migration
Don't overhaul your whole system at once. Start with 1-2 API call types that have the highest cost share and lowest performance requirements. Expand the scope as you monitor quality metrics.

Open source isn't always the answer

For regulated industries (healthcare, finance, law), multimodal capabilities, or cutting-edge reasoning, closed APIs still have the edge. Don't sacrifice quality for cost savings — always validate quality on your actual tasks before migrating.

🔗

더 깊이 파고 싶다면

Announcing Baseten's Series F

Official $1.5B fundraise announcement — 20x revenue growth and 40x inference volume growth figures.

OpenEvidence Case Study

Real-world case: 78% latency reduction and 6x faster deployment for a medical AI startup.

AI Inference vs Training Infrastructure: Why the Economics Are Diverging

Deep analysis of inference vs training economics with key data.

Baseten Revenue & Funding Analysis

Sacra's independent quantitative analysis — valuation multiples and growth trajectory.

Baseten Nears $1.5B Raise, Tripling in Five Months

In-depth analysis of the round structure and competitive landscape.

AI inference startup Baseten reportedly raising $1.5B

Original TechCrunch report — funding context and market dynamics.

FAQ

Can an inference layer like Baseten fully replace the OpenAI API?

Not for every task. It's most realistic to start by switching lower-stakes tasks — simple classification, summarization, and embedding — to open source. Tasks that require the unique capabilities of the latest GPT-4/Claude models still need direct API access.

Has open-source AI really reached GPT-4 level?

For general-purpose tasks, it is getting very close. Models like Llama 3.3 70B and Qwen2.5 72B match or surpass GPT-4 Turbo on many benchmarks. That said, gaps remain for multimodal tasks, complex reasoning, and cutting-edge knowledge. Testing on your actual use case is the most accurate approach.

Are there inference layer services besides Baseten?

Yes — Together AI, Fireworks AI, and Modal Labs are in the same space. Together AI has crossed $1B+ ARR, and Modal Labs has a $4.65B valuation. Positioning differs between them, so compare based on the model types you use and your cloud setup.

If OpenAI keeps cutting prices, won't inference layers become irrelevant?

Token prices keep dropping, but open-source model quality is rising just as fast. The core value of an inference layer is multi-cloud flexibility, avoiding vendor lock-in, and reducing engineering overhead — not just cost savings. Baseten's CEO has said that better open source means more growth for them too.

I'm currently using AI APIs directly. What should I do right now?

Start by identifying the highest-cost API call types in your last 3 months of invoices. If monthly AI costs are under $500, there's no urgent rush. Above that, try running those tasks on open-source models in Together AI or Baseten's free testing environment. 30 minutes should give you a rough cost-quality tradeoff estimate.

Written by Rush

Tracking where business meets AI.

Did you find this reference helpful?

Get curated references delivered to your inbox weekly

Share this reference

Top 20% of Companies Capture 74% of AI's Economic Value — PwC's 1,217-Executive Study Reveals the Real Gap

PwC's 2026 AI Performance Study shows that 74% of AI's economic value is captured by just 20% of companies. Here's what AI leaders do differently and how to close the gap.

Explore more AI workflow guides on similar topics