Gemini 2.5 Pro

Google Gemini 2.5 Pro TTS — highest fidelity with advanced style instructions.

The highest-fidelity Gemini voice — richer emotional range and stronger adherence to style instructions. Best for premium narration and audiobooks where nuance matters.


Endpoint	`POST /api/v1/tts/synthesize`
Model ID (`ttsModel`)	`gemini-2-5-pro`
Provider	Google Gemini (paid access)
Character Limit	3,000
Cost	1,050 / 1K chars

Shared conventions

Auth, the voice→provider rule, rate limits, and the full error table live on the Speech API page. Below is only what's specific to Gemini 2.5 Pro.

Model-specific notes

Follows style_instructions more precisely than Flash — ideal for directed, audiobook-style delivery.
Gemini requests pass through a fair-use queue (Pro/Max get priority); when it's full you get 503 SERVER_BUSY.

When `style_instructions` applies

Custom instructions are used only when both style_instructions_enabled is true and the instruction is at least 30 characters. Shorter instructions are ignored.

Request body

Parameter	Type	Required	Description
`text`	string	Yes	Text to synthesize (max 3,000 chars)
`voice`	string	Yes	A Gemini voice ID from GET /api/v1/tts/voices
`ttsModel`	string	Yes	Must be `"gemini-2-5-pro"`
`style_instructions`	string	No	Free-form delivery instruction (≥ 30 chars to take effect)
`style_instructions_enabled`	boolean	No	Activate `style_instructions` · Default: `false`
`speed`	number	No	Speaking rate · Default: `1.0`
`title`	string	No	Title for the saved project

{
  "text": "Chapter one. The world had changed — and nobody noticed at first.",
  "voice": "voice-uuid-gemini-pro",
  "ttsModel": "gemini-2-5-pro",
  "style_instructions": "Narrate like a classic audiobook — calm, measured, with subtle dramatic pauses.",
  "style_instructions_enabled": true
}

Response (200 OK)

{
  "url": "https://cdn.sonnalabs.app/sonna/api-ephemeral/tts/paid/user123/mno345.mp3",
  "remainingCredits": 100195,
  "projectCreated": true
}

Errors (400 TEXT_TOO_LONG, 403 PAID_ACCESS_REQUIRED, 409, 429, 503 SERVER_BUSY when the queue is full) follow the shared Speech API table.

Model-specific notes

Request body

Response (200 OK)

On this page