Speech API

Shared conventions for Text-to-Speech: endpoint, auth, voice selection, rate limits, and errors.

All Text-to-Speech models share one endpoint and the same conventions. This page documents them once; each model page then covers only what is specific to that model (limits, cost, and its own parameters).

Endpoint

POST https://api.sonnalabs.app/api/v1/tts/synthesize
Authorization: Bearer sona_sk_your_api_key_here
Content-Type: application/json

You select the model with the ttsModel field and the voice with voice. Synthesis is synchronous — the response returns the finished audio URL directly (no job_id, no polling).

{
  "url": "https://cdn.sonnalabs.app/sonna/api-ephemeral/tts/paid/user123/abc123.mp3",
  "remainingCredits": 99580,
  "projectCreated": true
}

Output is temporary — download it

Files generated through the API are stored on a temporary prefix (sonna/api-ephemeral/…) and are automatically deleted after 24 hours. The API is a generation service, not file hosting — download the url and store it on your own infrastructure. Files created in the Sonna app/web stay in your Library; API output does not.

The voice determines the provider

The provider (ElevenLabs / Gemini / Google) is resolved from the voice, not from ttsModel. Pick a voice whose provider matches your chosen model — list them with GET /api/v1/tts/voices. If a voice from another provider is sent, that provider serves the request and ttsModel is ignored.

Access

Who can use which provider

Google Cloud voices (Neural2 / Wavenet) work on every plan, including Free. ElevenLabs and Gemini require an active Pro/Max plan or PAYG credits — Free-tier requests for them return 403 PAID_ACCESS_REQUIRED.

Rate limits & concurrency

30 requests/minute per user.
One synthesis at a time per account — a second concurrent call returns 409 DUPLICATE_REQUEST.
Credits are auto-refunded on any failure.
Gemini requests also pass through a fair-use queue (Pro/Max get priority); when it's full you get 503 SERVER_BUSY.

See Rate Limits for details.

Errors

Status	Code	Reason
400	`TEXT_TOO_LONG`	Text exceeds the model's limit (or your plan cap)
400	—	`text` or `voice` missing, or the voice ID is invalid
402	—	Insufficient credits
403	`PAID_ACCESS_REQUIRED`	Free-tier account using an ElevenLabs or Gemini voice
409	`DUPLICATE_REQUEST`	Another synthesis is already in progress for your account
429	—	Rate limit exceeded
503	`SERVER_BUSY`	Gemini synthesis queue is full — retry shortly
503	`PROVIDER_BUSY`	Provider temporarily over capacity — retry after `Retry-After`

Billing

Authenticating with an API key applies a 10% credit discount on speech. Speech is billed per character (pro-rated). Per-model rates are on each model page and in Credits & Pricing.

Models

Provider	Models	Access
ElevenLabs	Eleven v3, Multilingual v2, Flash v2.5	Paid
Google Gemini	2.5 Flash, 2.5 Pro	Paid
Google Cloud	Neural2, Wavenet	Free + Paid

Also available: Multi-Speaker Dialogue and Enhance Text.