ElevenLabs v3 official cover art with gradient background

eleven-public-cdn.elevenlabs.io

ElevenLabs v3 + 11 Voices — AI Voice Expressiveness Surpasses Humans

ElevenLabs v3 and its 11 Voices feature deliver emotional range and nuance that Content

ElevenLabs debuts 11 Voices docuseries at SXSW — 1 Million Voices campaign

Eleven v3: Most Expressive AI TTS Model Launched

ElevenLabs Audio Tags: More control over AI Voices

Just 10 minutes of recordings brought a lost voice back. Actor Eric Dane, who lost his ability to speak to ALS, started telling his story again in his own voice using ElevenLabs' voice restoration technology. Then in February 2026, ElevenLabs launched Eleven v3 — their most expressive TTS model ever — rewriting the standard for AI voice synthesis. We've moved beyond just "reading aloud" to an era where text alone can produce voices that whisper, laugh, and sigh.

TL;DR

Eleven v3 launched (70+ languages) → Audio Tags for emotion & non-verbal control → Text to Dialogue API (multi-voice) → 11 Voices: voice restoration for 1M people with ALS

What is this?

ElevenLabs v3 has two stories happening at once. One about technology, one about people.

The technology story — Eleven v3 model. ElevenLabs' latest voice synthesis model, launched on February 12, 2026. While the previous model (Multilingual v2) focused on "reading naturally," v3 was built with the goal of "acting" voices. Three key changes stand out.

First, Audio Tags. You can insert emotion or action cues in brackets within the text. Add tags like [whispers], [excited], [sighs], [laughs] and the model adjusts tone and pace accordingly. It even supports sound effect tags like [gunshot] and [explosion], reducing the need to separately edit sound effects for audiobooks or game dialogue.

Second, Text to Dialogue API. An API that weaves multiple voices into a single conversation. Specify up to 10 unique voices, and it generates natural dialogue where each character reacts to others' speech patterns. Usable anywhere multi-character audio is needed: podcasts, audiobooks, game dialogue.

Third, 70+ language support. Broad coverage including Korean, Japanese, Chinese, Arabic, and other Asian and Middle Eastern languages, with automatic accent adjustment based on text content.

70+

Supported languages

Blind listening test winner

2.83%

Word error rate (industry lowest)

In independent blind listening tests, ElevenLabs placed first with 37 votes. Second place got 19. The word error rate (WER) of 2.83% is industry-leading.

The human story — 11 Voices project. A docuseries unveiled at SXSW on March 11, 2026. Eleven people who lost their voices to ALS, cerebral palsy, and other conditions narrate their own stories using AI-restored versions of their voices. Actor Eric Dane regained his voice through ElevenLabs technology while battling ALS, and his wife Rebecca Gayheart Dane became the spokesperson for this project.

ElevenLabs co-founder Mati Staniszewski said: "When someone loses their voice, they lose their independence and their connection to the people they love." With just 10 minutes of past recordings, they can create a nearly indistinguishable digital voice, integrated with assistive devices for everyday conversation.

1 Million Voices campaign

ElevenLabs pledged to provide free voice restoration technology to 1 million people experiencing voice loss. In-kind donation value of $1 billion. About 7,000 people have been supported so far, working with 800+ nonprofit partners across 49 countries. The official trailer was narrated by Sir Michael Caine using an ElevenLabs voice.

What changes?

With so many choices in the AI TTS market now, what v3 actually changes is what matters.

	Previous TTS (v2 generation)	Eleven v3
Emotional expression	Flat tone, lacking nuance	Real-time emotion & non-verbal control via Audio Tags
Multi-speaker	Generate individually, edit manually	Natural dialogue generated at once via Text to Dialogue API
Languages	29 (Multilingual v2)	70+ with automatic accent adaptation
Non-verbal expression	Not possible	Inline tags: [laughs], [sighs], [whispers], etc.
Sound effects	Separate editing needed	Insert via tags: [gunshot], [explosion], etc.
Character limit	10,000 chars (~10 min)	5,000 chars (~5 min) — quality-first design
Technical approach	Prosody-based synthesis	Context-aware expressive modeling

According to CloudThat's technical analysis, v3's core architectural change is a "shift from prosody-based synthesis to context-aware expressive modeling." Emotion and intent are embedded in the generated tokens themselves, not added as post-processing effects. This keeps emotions consistent even across long texts.

There are trade-offs, of course. v3's character limit is 5,000, shorter than v2's 10,000 or Flash v2.5's 40,000. Compute costs are higher too. So ElevenLabs divided models by use case: v3 for premium content where expressiveness matters, v2 for general narration, and Flash v2.5 (latency ~75ms) for real-time conversation.

Things to know

v3 is still in alpha, so occasional bugs are possible. Accent shifts mid-generation have been reported for long content, and some reviews note that failed generations push actual costs to 2.8x the listed price. For production environments, running v2 alongside is recommended.

The essentials: how to get started

Create a free account
Sign up at elevenlabs.io for 10,000 free characters per month. All users have access to v3.
Select the v3 model
In the Text to Speech screen, open the model dropdown and select "Eleven v3." The default is v2, so manual switching is needed.
Experiment with Audio Tags
Try inserting tags like [whispers] It's a secret [normal] actually it's nothing [laughs] in your text. You'll immediately feel how natural the emotional transitions are.
Try Text to Dialogue
In the API or ElevenLabs platform, assign two voices and input dialogue text. Natural conversation is generated where each character reacts to the other.
If you need voice restoration
If you or someone you know is experiencing voice loss, you can apply for a free lifetime license at elevenlabs.io/impact-program.

🔗

Want to go deeper?

Eleven v3 Official Announcement

ElevenLabs official blog. Listen to v3 model tech specs and demo voices directly.

Audio Tags Guide

Complete list and usage of emotion, non-verbal, and sound effect tags with practical examples.

Text to Dialogue API Docs

Multi-speaker dialogue generation API reference with code examples and parameters.

11 Voices Docuseries Press Release

Full context of the SXSW premiere, 1 Million Voices campaign, and Eric Dane's story.

Impact Program — Free Voice Restoration

Application page for free lifetime licenses for people experiencing voice loss.

FAQ

Can I use ElevenLabs v3 for free?

Yes. A free account gives you 10,000 characters per month with access to all models including v3. Paid plans start at Starter $5/month (30,000 characters).

What kinds of Audio Tags are available?

There are emotion tags ([excited], [sad], [angry]), non-verbal tags ([whispers], [laughs], [sighs]), and sound effect tags ([gunshot], [explosion], [clapping]). Put your desired cue in brackets and the model interprets and reflects it in the voice.

Can the Text to Dialogue API be used for real-time conversation?

No. Text to Dialogue is for pre-produced content (audiobooks, podcasts, game dialogue). For real-time conversation, the Flash v2.5 model with 75ms latency is more suitable.

Who can apply for the voice restoration program?

Anyone experiencing permanent voice loss from ALS, cerebral palsy, stroke, etc. can apply for a free lifetime license at elevenlabs.io/impact-program. About 10 minutes of past recordings is all you need.

Should I use v3 or v2?

v3 is optimal for audiobooks, games, and dramatic content where emotional expression matters. For stable, general narration or corporate videos, v2 is still a solid choice. v3 is still in alpha, so occasional bugs are possible.

Written by Emma

Experimenting with and documenting the possibilities of AI creativity.

Did you find this reference helpful?

Get curated references delivered to your inbox weekly

Share this reference

Hollywood Is Shaking — The AI Video Generator That Got a Cease-and-Desist From Disney

Seedance 2.0 is ByteDance's AI video generator that creates 2K video with synchronized audio from text, images, and audio inputs. It's free to use and supports lip-sync in 8 languages — here's why Hollywood is worried and how to get started.

Explore more AI workflow guides on similar topics

That 30-Page Report? Now You Can Listen to It — NotebookLM Turns Documents Into Podcasts

storage.googleapis.com

Upload PDFs, meeting notes, or research papers, and two AI hosts turn them into

That 30-Page Report? Now You Can Listen to It — NotebookLM Turns Documents Into Podcasts

Upload PDFs, meeting notes, or research papers, and two AI hosts turn them into a podcast-style conversation. Korean supported, free to use, and you can even ask questions mid-listen. The era of reading reports is over — now you listen.

6 AI Writing Tools Tested — Which One Should You Actually Use?

website-cms.eesel.ai

Jasper, Surfer SEO, Descript, Canva AI, ChatGPT, and a dedicated blog writer — 6

6 AI Writing Tools Tested — Which One Should You Actually Use?

Jasper, Surfer SEO, Descript, Canva AI, ChatGPT, and a dedicated blog writer — 6 AI writing tools tested hands-on and sorted by use case.

Next →Hollywood Is Shaking — The AI Video Generator That Got a Cease-and-Desist From Disney