Text to Speech
AI text-to-speech: turn text into natural AI voiceovers with ElevenLabs, Google Cloud, and Gemini voices in 60+ languages.
Sonna's Text to Speech (TTS) studio lets you convert scripts, articles, and book chapters into high-quality spoken audio using natural-sounding AI voices.
Supported Narration Models
We integrate top-tier speech synthesis engines to provide a range of languages, latency profiles, and accents:
| Provider | Model Name | Credit Cost (per 1K chars) | Max Script Length | Key Features |
|---|---|---|---|---|
| ElevenLabs | Eleven v3 | 2,100 / 1K chars | 5,000 chars | Maximum expression and emotional range. |
| ElevenLabs | Multilingual v2 | 2,100 / 1K chars | 10,000 chars | Exceptional translation pronunciation in 29+ languages. |
| ElevenLabs | Flash v2.5 | 1,050 / 1K chars | 40,000 chars | Ultra-fast generation, ideal for long scripts. |
| Google Cloud | Neural2 | 500 / 1K chars | 3,000 chars | Standard narration, clean and clear. |
| Google Cloud | Wavenet | 500 / 1K chars | 3,000 chars | Balanced narration quality. |
| Gemini | 2.5 Flash | 700 / 1K chars | 3,000 chars | Fast, conversational speech tone. |
| Gemini | 2.5 Pro | 1,050 / 1K chars | 3,000 chars | Context-aware, natural emphasis. |
Rates are shown per 1,000 characters for readability, but billing is per character (pro-rated). A 500-character script on Eleven v3 costs 1,050 credits, not 2,100. Minimum charge is 1 character.
Character Limits & Free Tier Restrictions
To protect backend resources, character limits per request are restricted based on your subscription plan:
- Free Plan Users: Constrained to a maximum of 1,000 characters per request, regardless of the model chosen.
- Pro & Max Plan Users: Can generate up to the maximum character limits listed in the model table above (e.g., up to 40,000 characters using
Flash v2.5).
Voice Library
Browse Sonna's curated library of ready-to-use voices, filterable by language, accent, gender, and optimal use case (e.g. narrator, storytelling, energetic, professional). The library spans all three providers:
- ElevenLabs — the most expressive voices, including premium professional voices.
- Google Cloud — Neural2 and WaveNet, clean and reliable across many languages.
- Gemini — natural, conversational multilingual voices.
Pick a voice from the dropdown in the Text to Speech studio, then choose a model that fits (see the table above). Save the ones you use most to your favorites.