images.ctfassets.net

DeepL Voice API — How to Plug Real-Time Translation into Contact Center Calls

DeepL launched a Voice API that adds real-time speech translation to phone callsDev

DeepL Launches Voice API for Real-Time Speech Transcription and Translation

DeepL Voice: instant, secure voice translation for global teams

Translate Speech in Realtime - DeepL Voice API Documentation

A customer calls in Spanish, but the agent only speaks Korean. Before, it would end with "Let me transfer you to an English-speaking agent." Now you don't have to hang up that call.

3-Second Summary

Customer voice input → WebSocket streaming → Real-time speech recognition → Up to 5 languages translated simultaneously → Subtitles on agent's screen

What Is This?

DeepL officially launched its Voice API in February 2026. In short, it's an API that streams audio in and returns real-time speech recognition + translation simultaneously. Think of it as the voice version of DeepL's existing text translation API.

DeepL is an AI translation company that started in Cologne, Germany. They raised $300 million in May 2024 at a $2 billion valuation. As of late 2024, annual revenue was $185 million with 1,570 employees. They're serious about translation accuracy — in blind tests, language experts preferred DeepL translations 1.3x over Google and 2.3x over Microsoft.

The Voice API has three key features:

WebSocket-Based Real-Time Streaming
Not HTTP request-response but WebSocket connections — keep streaming audio in and translations keep flowing out. Latency is extremely low.
Simultaneous 5-Language Translation
A single audio stream can be translated into up to 5 target languages at once. This means conference call participants can each receive captions in their native language.
Voice-to-Voice Real-Time Interpretation (Early Access)
Instead of text, it plays back translated audio directly. Agents hear the customer's words in their own language in real time.

The target customers are clear. Contact centers and BPO (outsourcing) companies are priority #1. Calls that had to be transferred due to language barriers, global meetings that required hired interpreters — these scenarios can use it immediately.

What Makes It Different?

Real-time voice translation isn't DeepL's exclusive territory. Google Cloud Speech-to-Text, Microsoft Azure Speech, OpenAI Realtime API — lots of competitors. But the approach is different.

	Traditional (Manual/Sequential Translation)	DeepL Voice API
Processing	Record → STT → Translate → Deliver (sequential)	Real-time streaming (simultaneous)
Latency	Several seconds to tens of seconds	Sub-second low latency
Translation accuracy	General-purpose models	Expert blind test 1.3x (vs Google)
Simultaneous languages	1	Up to 5 simultaneously
Integration method	REST API (request-response)	WebSocket (bidirectional streaming)
Post-editing burden	2x edits needed vs Google	Minimal edits (3x fewer than GPT-4)

Let's also compare by competing tool:

Tool	Strength	Weakness	Voice Translation
DeepL Voice API	Top-tier translation accuracy, 5 simultaneous languages	Enterprise-only, pricing undisclosed	STT + Translation + Voice-to-Voice
Google Cloud STT + Translate	125 languages, affordable	Translation quality lower than DeepL	STT → Translation (separate APIs)
Microsoft Azure Speech	Native Teams integration	Translation accuracy 2.3x lower than DeepL	STT + Translation integrated
OpenAI Realtime API	Strong for conversational AI agents	Not a translation-specialized tool	Voice I/O (not translation-focused)
Sanas	Accent transformation specialist, 20 BPOs adopted	Accent neutralization, not translation	Accent conversion (not translation)

According to Forrester research, companies using DeepL achieved 90% reduction in translation time, 50% workload reduction, and 345% ROI. This includes text translation numbers, but adding Voice API should boost voice-based workflow efficiency even further.

Real deployment examples

IT consulting firm Inetum uses DeepL Voice to distribute internal support teams across countries, supporting all employees regardless of language. Global bakery company Brioche Pasquier reported that after deploying Voice for Meetings, "collaboration barriers between country sites disappeared."

The Essentials: How to Start with DeepL Voice API

Check your API plan
Voice API is available on DeepL API Pro ($5.49/mo base) and above. Enterprise subscribers get direct access via the v3 endpoint.
Open a WebSocket session
POST v3/voice/realtime returns a temporary streaming URL + auth token. This token is single-use.
Start audio streaming
Open a WebSocket connection to the received URL and send a mono audio stream. Audio must be sent within 30 seconds to keep the connection alive.
Receive translation results
Original language transcription and target language translation come back in real time. Continuous sessions up to 1 hour are supported.
Integrate into existing systems
Display translation results as subtitles or real-time text in your contact center software, CRM, or video conferencing tools — and you're done.

Note

The official DeepL SDK doesn't yet include Voice API integration. You'll need to use WebSocket client libraries directly. The DeepL CLI tool does support Voice API.

Beyond Voice — The Full DeepL Platform Picture

Looking at Voice API in isolation misses the bigger picture. DeepL is currently building a full platform: Translation API → Write API → Voice API.

1/3

DeepL Voice for Meetings

Real-time translated captions in Microsoft Teams and Zoom. Each participant sees captions in their native language. Meeting data is processed in memory only and deleted after the session ends.

2/3

DeepL Voice for Conversations

A mobile solution for 1:1 in-person conversations. Split View lets both parties see translations simultaneously on one device.

3/3

DeepL Voice API

An API for developers to integrate directly into their apps. Embed voice translation into contact centers, CRMs, and custom platforms.

🔗

Want to Dig Deeper?

DeepL Voice API Official Documentation

Everything developers need: WebSocket connections, audio formats, session management.

DeepL Voice Product Page

See the differences between Meetings, Conversations, and API models at a glance with demos.

Introducing DeepL Voice — Blog

The birth story and vision of the Voice product line, explained directly by DeepL.

DeepL Next-Gen LLM Translation Accuracy Analysis

Blind test results vs Google and GPT-4 with per-language performance data.

The Borderless Contact Center — DeepL Blog

Strategy and case studies for building multilingual customer support teams with real-time translation.

DeepL CLI — GitHub

Official CLI tool to test Translate, Write, and Voice APIs from the command line.

FAQ

How much does the Voice API cost? Is it per-character like the text translation API?

Voice API is available on DeepL API Pro ($5.49/mo base) and above, but the detailed voice translation pricing hasn't been publicly disclosed yet. Enterprise subscribers can get custom quotes through the sales team. Unlike the text API, it's likely to be streaming-time-based billing.

Does it support Korean speech recognition? How's the accuracy?

Yes, it supports speech recognition for 13 languages including Korean. Real-time subtitle translation is available for all 33 languages supported by DeepL's translator. Korean-English translation quality improved 1.7x with DeepL's next-gen model.

How do I integrate it with existing contact center software (Genesys, Zendesk, etc.)?

Since the Voice API is WebSocket-based, you'd build middleware that forwards your contact center software's audio stream via WebSocket and displays translation results on the agent's screen. Official SDK integration isn't available yet, but you can develop immediately using the DeepL CLI or standard WebSocket libraries.

Is meeting recording data stored on DeepL servers? I'm concerned about security.

DeepL processes all voice data temporarily in memory only and deletes it immediately when the session ends. In-transit encryption is applied, and they explicitly state that customer data is never used for AI model training. Processing on GDPR-compliant EU servers is also an advantage.

Written by Kevin

Dissecting AI tools and workflows from a developer's lens.

Did you find this reference helpful?

Get curated references delivered to your inbox weekly

Share this reference

Antioch — Meet the Cursor for Robot AI

Physical AI startups no longer need to rent warehouses or build million-dollar test facilities. Antioch brings software-speed development to robotics through cloud simulation — and just raised $8.5M seed to prove it.

Explore more AI workflow guides on similar topics

$20K and 12 AI Tools Built a $1.8B Telehealth Company — And Then the Red Flags Arrived

morningbrew.com

Medvi telehealth, AI startup leverage, GLP-1 startup, one-person unicorn, AI operations

$20K and 12 AI Tools Built a $1.8B Telehealth Company — And Then the Red Flags Arrived

Matthew Gallagher built Medvi, a GLP-1 telehealth startup, in 14 months with $20,000 and AI tools. 2 employees. 16.2% net margin. $401M in year one. Here's how the model works — and where it's breaking.

AI That Works While You Sleep — Automating Recurring Tasks with Claude Code Scheduled Task

substackcdn.com

What if your code review was already done when you woke up, and your newsletter

AI That Works While You Sleep — Automating Recurring Tasks with Claude Code Scheduled Task

What if your code review was already done when you woke up, and your newsletter sources were already organized? Here's how to automate recurring tasks with Claude Code Scheduled Task.

Next →Antioch — Meet the Cursor for Robot AI