Just 10 minutes of recordings brought a lost voice back. Actor Eric Dane, who lost his ability to speak to ALS, started telling his story again in his own voice using ElevenLabs' voice restoration technology. Then in February 2026, ElevenLabs launched Eleven v3 — their most expressive TTS model ever — rewriting the standard for AI voice synthesis. We've moved beyond just "reading aloud" to an era where text alone can produce voices that whisper, laugh, and sigh.

TL;DR
Eleven v3 launched (70+ languages) Audio Tags for emotion & non-verbal control Text to Dialogue API (multi-voice) 11 Voices: voice restoration for 1M people with ALS

What is this?

ElevenLabs v3 has two stories happening at once. One about technology, one about people.

The technology story — Eleven v3 model. ElevenLabs' latest voice synthesis model, launched on February 12, 2026. While the previous model (Multilingual v2) focused on "reading naturally," v3 was built with the goal of "acting" voices. Three key changes stand out.

First, Audio Tags. You can insert emotion or action cues in brackets within the text. Add tags like [whispers], [excited], [sighs], [laughs] and the model adjusts tone and pace accordingly. It even supports sound effect tags like [gunshot] and [explosion], reducing the need to separately edit sound effects for audiobooks or game dialogue.

Second, Text to Dialogue API. An API that weaves multiple voices into a single conversation. Specify up to 10 unique voices, and it generates natural dialogue where each character reacts to others' speech patterns. Usable anywhere multi-character audio is needed: podcasts, audiobooks, game dialogue.

Third, 70+ language support. Broad coverage including Korean, Japanese, Chinese, Arabic, and other Asian and Middle Eastern languages, with automatic accent adjustment based on text content.

70+
Supported languages
#1
Blind listening test winner
2.83%
Word error rate (industry lowest)

In independent blind listening tests, ElevenLabs placed first with 37 votes. Second place got 19. The word error rate (WER) of 2.83% is industry-leading.

The human story — 11 Voices project. A docuseries unveiled at SXSW on March 11, 2026. Eleven people who lost their voices to ALS, cerebral palsy, and other conditions narrate their own stories using AI-restored versions of their voices. Actor Eric Dane regained his voice through ElevenLabs technology while battling ALS, and his wife Rebecca Gayheart Dane became the spokesperson for this project.

ElevenLabs co-founder Mati Staniszewski said: "When someone loses their voice, they lose their independence and their connection to the people they love." With just 10 minutes of past recordings, they can create a nearly indistinguishable digital voice, integrated with assistive devices for everyday conversation.

1 Million Voices campaign

ElevenLabs pledged to provide free voice restoration technology to 1 million people experiencing voice loss. In-kind donation value of $1 billion. About 7,000 people have been supported so far, working with 800+ nonprofit partners across 49 countries. The official trailer was narrated by Sir Michael Caine using an ElevenLabs voice.

What changes?

With so many choices in the AI TTS market now, what v3 actually changes is what matters.

Previous TTS (v2 generation) Eleven v3
Emotional expression Flat tone, lacking nuance Real-time emotion & non-verbal control via Audio Tags
Multi-speaker Generate individually, edit manually Natural dialogue generated at once via Text to Dialogue API
Languages 29 (Multilingual v2) 70+ with automatic accent adaptation
Non-verbal expression Not possible Inline tags: [laughs], [sighs], [whispers], etc.
Sound effects Separate editing needed Insert via tags: [gunshot], [explosion], etc.
Character limit 10,000 chars (~10 min) 5,000 chars (~5 min) — quality-first design
Technical approach Prosody-based synthesis Context-aware expressive modeling

According to CloudThat's technical analysis, v3's core architectural change is a "shift from prosody-based synthesis to context-aware expressive modeling." Emotion and intent are embedded in the generated tokens themselves, not added as post-processing effects. This keeps emotions consistent even across long texts.

There are trade-offs, of course. v3's character limit is 5,000, shorter than v2's 10,000 or Flash v2.5's 40,000. Compute costs are higher too. So ElevenLabs divided models by use case: v3 for premium content where expressiveness matters, v2 for general narration, and Flash v2.5 (latency ~75ms) for real-time conversation.

Things to know

v3 is still in alpha, so occasional bugs are possible. Accent shifts mid-generation have been reported for long content, and some reviews note that failed generations push actual costs to 2.8x the listed price. For production environments, running v2 alongside is recommended.

The essentials: how to get started

  1. Create a free account
    Sign up at elevenlabs.io for 10,000 free characters per month. All users have access to v3.
  2. Select the v3 model
    In the Text to Speech screen, open the model dropdown and select "Eleven v3." The default is v2, so manual switching is needed.
  3. Experiment with Audio Tags
    Try inserting tags like [whispers] It's a secret [normal] actually it's nothing [laughs] in your text. You'll immediately feel how natural the emotional transitions are.
  4. Try Text to Dialogue
    In the API or ElevenLabs platform, assign two voices and input dialogue text. Natural conversation is generated where each character reacts to the other.
  5. If you need voice restoration
    If you or someone you know is experiencing voice loss, you can apply for a free lifetime license at elevenlabs.io/impact-program.