Valuation jumped from $5B to $13B in just five months. Not a company that builds GPT or Claude. A startup that sells the infrastructure to run those models.

Baseten's explosion isn't a coincidence. There's a shift happening in AI cost structures that most teams don't know about yet.

30-Second Summary
Direct OpenAI API Inference costs spike Open source matures Inference layer emerges Up to 30% savings

The better the models get, the more delivery costs become the battlefield

AI investment has long been concentrated in model development. Billions poured into OpenAI, Anthropic, and xAI. But there's a fact that rarely gets attention.

80-90% of total AI costs come from inference

Model training costs account for only 10-20% of total AI operational costs. The remaining 80-90% comes from actually running the models — inference. Costs are incurred every time a user sends a query.

In 2023, inference accounted for 1/3 of all AI computing. By 2026, it has crossed 2/3. AI is now being deployed at scale in real-world services. And there's good news: AI inference costs have plummeted from $20 per million tokens in 2023 to just $0.07. The problem is that most teams are still wired directly to OpenAI or Anthropic, paying more than they need to.

Open-source model quality has improved dramatically. Models like Llama 3.3, Mistral, and Qwen now approach GPT-4 level on many benchmarks, meaning routing to open source can cut costs by up to 30-50%. But implementing this yourself requires 20 clouds, dozens of models, and auto-routing logic — serious engineering overhead.

That's exactly the gap Baseten stepped into.

3x revenue in 5 months — what does Baseten actually do?

Baseten doesn't own GPUs. Instead, it connects 87 global clusters across 18 cloud providers and routes inference requests to the most cost-efficient option. Think of it as an "AI infrastructure orchestrator."

$200M→$600M
ARR, tripled in a single quarter
1,900%
Year-over-year revenue growth
1B+
Daily inference requests processed

Customers include Cursor (AI coding editor), Notion, and Mercor. Among them, OpenEvidence delivers real-time AI-powered medical information to hundreds of thousands of physicians worldwide — and switching to Baseten produced concrete results.

78%
Latency reduction (700ms → 160ms)
6x
Faster deployment
8x+
Reduction in infrastructure overhead

"With Baseten, everything just works. The complexity we used to pour into underlying infrastructure is gone."

— Zachary Ziegler, Co-founder & CTO, OpenEvidence

It's not just cost savings — it absorbs engineering burden too. Baseten's revenue grew 1,900% year-over-year and inference volume grew 40x in 2025.

Direct closed APIVia inference layer
Model choiceLocked to one provider20+ clouds, open source included
Token costsFixed pricingUp to 50%+ savings possible
Latency optimizationProvider-controlledAutomatic multi-cloud routing
Deployment speedHours to daysUnder 1 hour (OpenEvidence case)
Vendor lock-inHighLow

How to audit where your team's AI costs are leaking

Even if you don't plan to use Baseten, what this market is telling us is clear. If you're running AI in production, you need to audit your inference cost structure right now.

  1. Review your AI token costs
    Pull your last 3 months of OpenAI/Anthropic invoices and identify which models are costing what. Most teams find 70-80% of costs are concentrated in just 2-3 types of API calls.
  2. Classify tasks by model tier
    Not every task needs GPT-4 or Claude Opus. Simple classification, summarization, and embedding can often be handled by smaller open-source models. Map out the actual performance threshold each task type requires.
  3. Test open-source alternatives
    Together AI, Modal Labs, and Baseten all offer free testing environments. Run your current API tasks on open-source models like Llama 3.3 or Mistral and compare the results.
  4. Calculate the cost-quality tradeoff
    If quality is comparable, figure out how much you'd save annually. If monthly AI costs exceed $500, the ROI on an inference layer starts to make sense.
  5. Start a gradual migration
    Don't overhaul your whole system at once. Start with 1-2 API call types that have the highest cost share and lowest performance requirements. Expand the scope as you monitor quality metrics.

Open source isn't always the answer

For regulated industries (healthcare, finance, law), multimodal capabilities, or cutting-edge reasoning, closed APIs still have the edge. Don't sacrifice quality for cost savings — always validate quality on your actual tasks before migrating.