Valuation jumped from $5B to $13B in just five months. Not a company that builds GPT or Claude. A startup that sells the infrastructure to run those models.
Baseten's explosion isn't a coincidence. There's a shift happening in AI cost structures that most teams don't know about yet.
The better the models get, the more delivery costs become the battlefield
AI investment has long been concentrated in model development. Billions poured into OpenAI, Anthropic, and xAI. But there's a fact that rarely gets attention.
80-90% of total AI costs come from inference
Model training costs account for only 10-20% of total AI operational costs. The remaining 80-90% comes from actually running the models — inference. Costs are incurred every time a user sends a query.
In 2023, inference accounted for 1/3 of all AI computing. By 2026, it has crossed 2/3. AI is now being deployed at scale in real-world services. And there's good news: AI inference costs have plummeted from $20 per million tokens in 2023 to just $0.07. The problem is that most teams are still wired directly to OpenAI or Anthropic, paying more than they need to.
Open-source model quality has improved dramatically. Models like Llama 3.3, Mistral, and Qwen now approach GPT-4 level on many benchmarks, meaning routing to open source can cut costs by up to 30-50%. But implementing this yourself requires 20 clouds, dozens of models, and auto-routing logic — serious engineering overhead.
That's exactly the gap Baseten stepped into.
3x revenue in 5 months — what does Baseten actually do?
Baseten doesn't own GPUs. Instead, it connects 87 global clusters across 18 cloud providers and routes inference requests to the most cost-efficient option. Think of it as an "AI infrastructure orchestrator."
Customers include Cursor (AI coding editor), Notion, and Mercor. Among them, OpenEvidence delivers real-time AI-powered medical information to hundreds of thousands of physicians worldwide — and switching to Baseten produced concrete results.
"With Baseten, everything just works. The complexity we used to pour into underlying infrastructure is gone."
— Zachary Ziegler, Co-founder & CTO, OpenEvidence
It's not just cost savings — it absorbs engineering burden too. Baseten's revenue grew 1,900% year-over-year and inference volume grew 40x in 2025.
| Direct closed API | Via inference layer | |
|---|---|---|
| Model choice | Locked to one provider | 20+ clouds, open source included |
| Token costs | Fixed pricing | Up to 50%+ savings possible |
| Latency optimization | Provider-controlled | Automatic multi-cloud routing |
| Deployment speed | Hours to days | Under 1 hour (OpenEvidence case) |
| Vendor lock-in | High | Low |
How to audit where your team's AI costs are leaking
Even if you don't plan to use Baseten, what this market is telling us is clear. If you're running AI in production, you need to audit your inference cost structure right now.
- Review your AI token costs
Pull your last 3 months of OpenAI/Anthropic invoices and identify which models are costing what. Most teams find 70-80% of costs are concentrated in just 2-3 types of API calls. - Classify tasks by model tier
Not every task needs GPT-4 or Claude Opus. Simple classification, summarization, and embedding can often be handled by smaller open-source models. Map out the actual performance threshold each task type requires. - Test open-source alternatives
Together AI, Modal Labs, and Baseten all offer free testing environments. Run your current API tasks on open-source models like Llama 3.3 or Mistral and compare the results. - Calculate the cost-quality tradeoff
If quality is comparable, figure out how much you'd save annually. If monthly AI costs exceed $500, the ROI on an inference layer starts to make sense. - Start a gradual migration
Don't overhaul your whole system at once. Start with 1-2 API call types that have the highest cost share and lowest performance requirements. Expand the scope as you monitor quality metrics.
Open source isn't always the answer
For regulated industries (healthcare, finance, law), multimodal capabilities, or cutting-edge reasoning, closed APIs still have the edge. Don't sacrifice quality for cost savings — always validate quality on your actual tasks before migrating.




