Speech API
Shared conventions for Text-to-Speech: endpoint, auth, voice selection, rate limits, and errors.
All Text-to-Speech models share one endpoint and the same conventions. This page documents them once; each model page then covers only what is specific to that model (limits, cost, and its own parameters).
Endpoint
POST https://api.sonnalabs.app/api/v1/tts/synthesize
Authorization: Bearer sona_sk_your_api_key_here
Content-Type: application/jsonYou select the model with the ttsModel field and the voice with voice. Synthesis is synchronous — the response returns the finished audio URL directly (no job_id, no polling).
{
"url": "https://cdn.sonnalabs.app/sonna/api-ephemeral/tts/paid/user123/abc123.mp3",
"remainingCredits": 99580,
"projectCreated": true
}Output is temporary — download it
Files generated through the API are stored on a temporary prefix
(sonna/api-ephemeral/…) and are automatically deleted after 24 hours.
The API is a generation service, not file hosting — download the url and
store it on your own infrastructure. Files created in the Sonna app/web stay
in your Library; API output does not.
The voice determines the provider
The provider (ElevenLabs / Gemini / Google) is resolved from the voice, not from ttsModel. Pick a voice whose provider matches your chosen model — list them with GET /api/v1/tts/voices. If a voice from another provider is sent, that provider serves the request and ttsModel is ignored.
Access
Who can use which provider
Google Cloud voices (Neural2 / Wavenet) work on every plan, including
Free. ElevenLabs and Gemini require an active Pro/Max subscription
or PAYG credits — Free-tier requests for them return 403 PAID_ACCESS_REQUIRED.
Rate limits & concurrency
- 30 requests/minute per user.
- One synthesis at a time per account — a second concurrent call returns
409 DUPLICATE_REQUEST. - Credits are auto-refunded on any failure.
- Gemini requests also pass through a fair-use queue (Pro/Max get priority); when it's full you get
503 SERVER_BUSY.
See Rate Limits for details.
Errors
| Status | Code | Reason |
|---|---|---|
| 400 | TEXT_TOO_LONG | Text exceeds the model's limit (or your plan cap) |
| 400 | — | text or voice missing, or the voice ID is invalid |
| 402 | — | Insufficient credits |
| 403 | PAID_ACCESS_REQUIRED | Free-tier account using an ElevenLabs or Gemini voice |
| 409 | DUPLICATE_REQUEST | Another synthesis is already in progress for your account |
| 429 | — | Rate limit exceeded |
| 503 | SERVER_BUSY | Gemini synthesis queue is full — retry shortly |
| 503 | PROVIDER_BUSY | Provider temporarily over capacity — retry after Retry-After |
Billing
Authenticating with an API key applies a 10% credit discount on speech. Speech is billed per character (pro-rated). Per-model rates are on each model page and in Credits & Pricing.
Models
| Provider | Models | Access |
|---|---|---|
| ElevenLabs | Eleven v3, Multilingual v2, Flash v2.5 | Paid |
| Google Gemini | 2.5 Flash, 2.5 Pro | Paid |
| Google Cloud | Neural2, Wavenet | Free + Paid |
Also available: Multi-Speaker Dialogue and Enhance Text.