A customer calls in Spanish, but the agent only speaks Korean. Before, it would end with "Let me transfer you to an English-speaking agent." Now you don't have to hang up that call.

3-Second Summary
Customer voice input WebSocket streaming Real-time speech recognition Up to 5 languages translated simultaneously Subtitles on agent's screen

What Is This?

DeepL officially launched its Voice API in February 2026. In short, it's an API that streams audio in and returns real-time speech recognition + translation simultaneously. Think of it as the voice version of DeepL's existing text translation API.

DeepL is an AI translation company that started in Cologne, Germany. They raised $300 million in May 2024 at a $2 billion valuation. As of late 2024, annual revenue was $185 million with 1,570 employees. They're serious about translation accuracy — in blind tests, language experts preferred DeepL translations 1.3x over Google and 2.3x over Microsoft.

The Voice API has three key features:

  1. WebSocket-Based Real-Time Streaming
    Not HTTP request-response but WebSocket connections — keep streaming audio in and translations keep flowing out. Latency is extremely low.
  2. Simultaneous 5-Language Translation
    A single audio stream can be translated into up to 5 target languages at once. This means conference call participants can each receive captions in their native language.
  3. Voice-to-Voice Real-Time Interpretation (Early Access)
    Instead of text, it plays back translated audio directly. Agents hear the customer's words in their own language in real time.

The target customers are clear. Contact centers and BPO (outsourcing) companies are priority #1. Calls that had to be transferred due to language barriers, global meetings that required hired interpreters — these scenarios can use it immediately.

What Makes It Different?

Real-time voice translation isn't DeepL's exclusive territory. Google Cloud Speech-to-Text, Microsoft Azure Speech, OpenAI Realtime API — lots of competitors. But the approach is different.

Traditional (Manual/Sequential Translation)DeepL Voice API
ProcessingRecord → STT → Translate → Deliver (sequential)Real-time streaming (simultaneous)
LatencySeveral seconds to tens of secondsSub-second low latency
Translation accuracyGeneral-purpose modelsExpert blind test 1.3x (vs Google)
Simultaneous languages1Up to 5 simultaneously
Integration methodREST API (request-response)WebSocket (bidirectional streaming)
Post-editing burden2x edits needed vs GoogleMinimal edits (3x fewer than GPT-4)

Let's also compare by competing tool:

ToolStrengthWeaknessVoice Translation
DeepL Voice APITop-tier translation accuracy, 5 simultaneous languagesEnterprise-only, pricing undisclosedSTT + Translation + Voice-to-Voice
Google Cloud STT + Translate125 languages, affordableTranslation quality lower than DeepLSTT → Translation (separate APIs)
Microsoft Azure SpeechNative Teams integrationTranslation accuracy 2.3x lower than DeepLSTT + Translation integrated
OpenAI Realtime APIStrong for conversational AI agentsNot a translation-specialized toolVoice I/O (not translation-focused)
SanasAccent transformation specialist, 20 BPOs adoptedAccent neutralization, not translationAccent conversion (not translation)

According to Forrester research, companies using DeepL achieved 90% reduction in translation time, 50% workload reduction, and 345% ROI. This includes text translation numbers, but adding Voice API should boost voice-based workflow efficiency even further.

Real deployment examples

IT consulting firm Inetum uses DeepL Voice to distribute internal support teams across countries, supporting all employees regardless of language. Global bakery company Brioche Pasquier reported that after deploying Voice for Meetings, "collaboration barriers between country sites disappeared."

The Essentials: How to Start with DeepL Voice API

  1. Check your API plan
    Voice API is available on DeepL API Pro ($5.49/mo base) and above. Enterprise subscribers get direct access via the v3 endpoint.
  2. Open a WebSocket session
    POST v3/voice/realtime returns a temporary streaming URL + auth token. This token is single-use.
  3. Start audio streaming
    Open a WebSocket connection to the received URL and send a mono audio stream. Audio must be sent within 30 seconds to keep the connection alive.
  4. Receive translation results
    Original language transcription and target language translation come back in real time. Continuous sessions up to 1 hour are supported.
  5. Integrate into existing systems
    Display translation results as subtitles or real-time text in your contact center software, CRM, or video conferencing tools — and you're done.

Note

The official DeepL SDK doesn't yet include Voice API integration. You'll need to use WebSocket client libraries directly. The DeepL CLI tool does support Voice API.

Beyond Voice — The Full DeepL Platform Picture

Looking at Voice API in isolation misses the bigger picture. DeepL is currently building a full platform: Translation API → Write API → Voice API.

1/3

DeepL Voice for Meetings

Real-time translated captions in Microsoft Teams and Zoom. Each participant sees captions in their native language. Meeting data is processed in memory only and deleted after the session ends.

2/3

DeepL Voice for Conversations

A mobile solution for 1:1 in-person conversations. Split View lets both parties see translations simultaneously on one device.

3/3

DeepL Voice API

An API for developers to integrate directly into their apps. Embed voice translation into contact centers, CRMs, and custom platforms.