Overview
How the Sonna Text-to-Speech API is organized — the endpoint, voices, and conventions.
The Sonna API is a Text-to-Speech service. You send text and a voice, and it returns a finished audio file synchronously — no jobs, no polling.
| Entry point | Produces | Mode |
|---|---|---|
POST /api/v1/tts/synthesize | Speech (Text-to-Speech) | Synchronous — returns the audio URL directly |
Base URL for every request:
https://api.sonnalabs.appApp vs API
Image, Video, and Music generation are available inside the Sonna app/web, but are not exposed through the public developer API. The API surface is Text-to-Speech only.
The endpoint
Speech — POST /api/v1/tts/synthesize
Single-speaker synthesis. Pick a voice + a ttsModel. Two variants share the same auth and conventions:
POST /api/v1/tts/synthesize-dialogue— multi-speaker dialogue (ElevenLabs v3).POST /api/v1/tts/enhance— auto-insert expressive audio tags into your text.
The provider (ElevenLabs / Gemini / Google) is resolved from the voice, not the model. See Text-to-Speech Models for per-model details and the shared Speech API page for conventions.
Supporting endpoints
| Endpoint | Purpose | Section |
|---|---|---|
GET /api/v1/tts/voices | List available voices | Models & Voices |
GET /api/v1/tts/models | List Text-to-Speech models | Models & Voices |
GET /api/v1/user/credits | Credit balance & plan | Models & Voices |
Authentication
Every request needs a developer API key (sona_sk_...):
Authorization: Bearer sona_sk_your_api_key_hereAPI keys are available to paying users and apply a 10% credit discount on Speech. See Authentication.