cdn.tech.eu

It's Not the GPU Slowing Down AI — Why $355M Is Betting on the Memory Bottleneck

AI inference bottleneck, memory bandwidth, Fractile, XCENA, in-memory computing, AI inference costBusiness

UK AI chip startup Fractile raises $220M to tackle the growing inference bottleneck

Fractile Raises $220M as AI Inference Problem Gets Expensive

Fractile $220m round arrives as Anthropic eyes its UK silicon

When AI feels slow, the obvious fix seems to be buying more GPUs. But doubling your GPU count does not double token generation speed. Over the past decade, AI chip compute capability grew 80×, while memory bandwidth grew only 17×. Today's AI bottleneck is not the brain — it is the bloodstream.

3-second summary

More GPUs ≠ faster AI → Real bottleneck = memory bandwidth → In-memory computing emerges → Fractile $220M + XCENA $135M → 2027: AI cost structure shifts

Everyone believes this — more GPUs means faster AI

An NVIDIA H100 runs about $30,000. The B200 costs roughly twice that. AI companies pour billions into GPUs because they believe the formula: more GPUs = more compute = faster AI.

But look at memory bandwidth and the story changes. The NVIDIA H100 can process 3.35 TB of data per second. The H200 bumped that to 4.8 TB/s — a 43% improvement. The problem is that GPU compute performance improved far faster over the same period. There is plenty of compute power sitting idle, waiting for data to arrive from memory.

This is what engineers call the "Memory Wall." Every time an LLM generates a token, it has to read hundreds of gigabytes of model weights from memory. This read operation is the bottleneck — no matter how many compute cores you add, slow memory means waiting. The 80× compute vs 17× bandwidth gap over a decade is the essence of today's bottleneck.

80×

AI chip compute growth (10 years)

17×

Memory bandwidth growth (same period)

~1 month

Time to process 100M tokens today

The real problem is how far data has to travel

Here is how current AI chip architecture works in simple terms: data leaves memory, gets preprocessed by the CPU, travels to the GPU for computation, then returns to memory. This round trip repeats for every single token generated. That journey itself consumes time and energy.

What Fractile has been building since 2022 is a way to eliminate that journey. They are building an "In-Memory Compute" architecture where calculations happen directly inside SRAM cells alongside the compute logic. Matrix multiplications never leave memory — they are processed inside it, and only the results come out.

"Faster speed is not just about going from 10 seconds to 100 milliseconds. It is about going from weeks, months — down to something much, much shorter."
— Walter Goodwin, Fractile CEO

By the numbers: today's advanced AI systems can generate up to 100 million tokens solving complex problems, but at ~40 tokens per second on current hardware, that takes one month. Fractile's target is 1,200 tokens per second — bringing the same task down to a few days. The company claims their design could be 25× faster at one-tenth the cost compared to current GPU setups.

	Current GPU approach	In-memory computing
Data flow	Memory → CPU → GPU → Memory (repeat)	Compute completes inside memory
Bottleneck	Memory bandwidth ceiling (3–8 TB/s)	Minimizes data movement
100M token task	~1 month (40 tokens/sec)	Days target (1,200 tokens/sec)
Cost target	Baseline	1/10th cost (Fractile claim)

$355M landed on the same bet in two months

Fractile's $220M round in May 2026 got attention. But at the end of the same month, Korean chip startup XCENA also raised $135M at a $570M valuation. Their approaches differ — Fractile computes inside SRAM, while XCENA's MX1 chip uses CXL to place processing power right next to DRAM. But the diagnosis is identical.

In XCENA's own words: "Inference is not just a compute problem; it is increasingly a memory scaling problem." Teams in Seoul and London independently reached the same conclusion.

The investors say something too. Fractile has Founders Fund (Peter Thiel) and former Intel CEO Pat Gelsinger behind it. Anthropic is reportedly in early discussions to purchase Fractile chips once they ship. Anthropic currently sources compute from three suppliers — NVIDIA, Google TPUs, and Amazon Trainium. Fractile could become the fourth. The AI inference market is projected to grow from ~$103B in 2025 to ~$255B by 2030.

NVIDIA knows too

Blackwell boosted memory bandwidth significantly, and the H200 delivers 43% more bandwidth than the H100. But what Fractile/XCENA are targeting is not "better memory bandwidth inside a GPU" — it is "unifying memory and compute." NVIDIA will dominate short-term, but the long-term architecture bet is being placed right now.

What to do before 2027

Fractile's chip will not arrive until 2027. XCENA targets production by end of 2026. You can start preparing for this shift today.

Factor the cost decline curve into your AI planning
Per-token pricing from GPT, Claude, and Gemini follows infrastructure costs down. If AI ROI does not pencil out today, recalculate based on 2027–2028 pricing. Things not economically viable now may become possible then.
Design long-context workflows in advance
The workloads Fractile targets are "100M+ token deep reasoning" tasks. Claude's 200K and Gemini's 1M context windows are available now but expensive. Expect dramatically cheaper access post-2027 — map out processes that would benefit from long context today.
Revisit your speed-vs-cost tradeoffs
"Cost-optimized" mode slows AI responses. That tradeoff narrows post-2027. List the use cases you abandoned for speed reasons, and have them ready when infrastructure costs drop.
Watch for vendor lock-in
Anthropic eyeing Fractile as a fourth compute supplier signals the AI infrastructure diversification era has begun. More supplier diversity means more price competition. Be careful about contracts that lock you into a single vendor today.
Set a H2 2027 AI workflow checkpoint
Both Fractile and XCENA target 2026–2027 production. Mark that as your team's AI infrastructure and cost review point. Use cases that do not ROI today may be ready to go at that checkpoint.

🔗

더 깊이 파고 싶다면

UK AI chip startup Fractile raises $220M to tackle the growing inference bottleneck

Original Series B article — founder interview, full investor list

Fractile's $220m round arrives as Anthropic eyes its UK silicon

Deep dive on Anthropic relationship and in-memory compute architecture

XCENA Secures $135M Betting on Memory as AI's Real Bottleneck

Korean startup XCENA's CXL approach and MX1 chip breakdown

AI's Memory Wall Problem: Why More GPUs Don't Fix Inference Latency

Technical explanation of the memory wall and GPU bandwidth comparison table

LLM Inference Hardware: An Enterprise Guide to Key Players

Full 2026 AI inference hardware market map with enterprise guidance

FAQ

If Fractile chips ship, will my AI service costs drop immediately?

Not directly. Fractile chips need to be adopted by service providers like Anthropic, then internal costs drop, then pricing updates follow. Realistic timeline: 2027 production, 2028 service pricing impact.

Is this replacing NVIDIA or supplementing it?

Supplementing short-term. Fractile and XCENA are optimized for long-context, large-scale inference workloads. Rather than replacing general-purpose GPUs, they will be highly efficient alternatives for specific workloads. Long-term, they may signal a deeper architectural shift.

Is in-memory computing a new concept?

No — it has been researched for decades. What makes Fractile different is designing it to be commercially viable at the scale of LLM matrix operations. Engineers from ARM and AI labs have been co-developing the chip design and AI software stack together.

Will all AI services become faster in 2027?

Not all at once. Fractile chip is optimized for complex tasks requiring deep reasoning. The biggest impact comes in scientific research, complex coding, and long-document processing — anything requiring millions of tokens.

Does this matter for the Korean AI industry?

Yes. Given that SK Hynix and Samsung supply HBM memory, an in-memory computing paradigm shift connects directly to Korea's memory semiconductor industry. Which architectures become standard will shape strategic direction for Korean chipmakers.

Written by Rush

Tracking where business meets AI.

Did you find this reference helpful?

Get curated references delivered to your inbox weekly

Share this reference

Top 20% of Companies Capture 74% of AI's Economic Value — PwC's 1,217-Executive Study Reveals the Real Gap

PwC's 2026 AI Performance Study shows that 74% of AI's economic value is captured by just 20% of companies. Here's what AI leaders do differently and how to close the gap.

Explore more AI workflow guides on similar topics