Enhance Text (Audio Tags)
Auto-insert Eleven v3 audio tags into your text with an LLM, before synthesis.
Enhance rewrites your text to add Eleven v3 audio tags ([laughs], [whispers], [sighs], …) so narration sounds more expressive. It does not produce audio — it returns enhanced text that you then send to Synthesize with ttsModel: "eleven-v3".
What it is (and isn't)
Enhance is powered by Gemini Flash (Sonna's own implementation — there is no ElevenLabs "enhance" API). It only inserts tags; it never adds, removes, or changes your words. It is free (no credits) and synchronous.
Typical flow: /enhance → take the returned text → /synthesize with eleven-v3.
Enhance a single text
Endpoint
POST https://api.sonnalabs.app/api/v1/tts/enhance
Request
POST https://api.sonnalabs.app/api/v1/tts/enhance
Authorization: Bearer sona_sk_your_api_key_here
Content-Type: application/json| Parameter | Type | Required | Description |
|---|---|---|---|
text | string | Yes | Text to enhance (max 2,000 chars). |
voiceId | string | No | A Sonna voice ID — used so the LLM picks tags that fit the voice's traits. |
{
"text": "Are you serious? I can't believe you did that!",
"voiceId": "db4e815d-00aa-43e6-99cf-0d9b4db9a07a"
}Response (200 OK)
{
"text": "[appalled] Are you serious? [sighs] I can't believe you did that!",
"original": "Are you serious? I can't believe you did that!"
}Fail-soft
If enhancement fails, the endpoint still returns 200 with the original
text and "fallback": true, so your pipeline never breaks. Send the returned
text to synthesis either way.
Enhance a multi-speaker dialogue
Enhances every turn in one LLM round-trip so tags use conversational context. Pair this with Multi-Speaker Dialogue.
Endpoint
POST https://api.sonnalabs.app/api/v1/tts/enhance-dialogue
| Parameter | Type | Required | Description |
|---|---|---|---|
turns | object[] | Yes | Array of { id, text, voiceId? }. Total text across all turns ≤ 2,000 chars. id is echoed back so you can match turns. |
{
"turns": [
{
"id": "1",
"text": "Have you tried the new model?",
"voiceId": "voice-a"
},
{
"id": "2",
"text": "Just got it! The clarity is amazing.",
"voiceId": "voice-b"
}
]
}Response (200 OK)
{
"turns": [
{ "id": "1", "text": "[excited] Have you tried the new model?" },
{ "id": "2", "text": "[amazed] Just got it! The clarity is amazing." }
]
}Errors
| Status | Code | Reason |
|---|---|---|
| 400 | — | text (or turns) missing/empty |
| 400 | TEXT_TOO_LONG | Text exceeds the 2,000-character Enhance limit |
| 401 | — | Missing or invalid API key |
| 429 | — | Rate limit exceeded (shares the 30/min speech limit) |
No credits charged
Enhance does not deduct credits. Only the subsequent synthesis call costs credits.