MCP Server

Let Claude Code, Cursor, Cline, or any MCP-aware agent speak in one of your cloned voices — locally, with no cloud.

Overview

Sonna ships a built-in Model Context Protocol server so local AI agents can call your Sonna install directly: speak text in a voice profile, transcribe audio, and list captures or profiles. The server runs inside the same process as the rest of Sonna and is mounted at /mcp over Streamable HTTP.

Agent asks to speak → Sonna plays audio on your speakers → an on-screen pill surfaces the voice name for the whole duration so you always see what's coming out of your machine.

MCP shipped in 0.5.0 alongside Dictation and Voice Personalities. The design goal is "local voice layer for every agent on your machine" — the same app that captures your voice can generate a response in any voice profile you've cloned.

Quick install

Claude Code

claude mcp add sonna \
  --transport http \
  --url http://127.0.0.1:17493/mcp \
  --header "X-Sonna-Client-Id: claude-code"

Cursor / Windsurf / VS Code MCP / any HTTP MCP client

Drop this into the client's MCP config (usually .mcp.json or a Settings UI):

{
  "mcpServers": {
    "sonna": {
      "url": "http://127.0.0.1:17493/mcp",
      "headers": { "X-Sonna-Client-Id": "cursor" }
    }
  }
}

Change cursor to whatever name you want the binding to show up as in Sonna → Settings → MCP. The value is just an identifier for the per-client voice binding — not a secret, not a credential.

Clients that only speak stdio

A stdio shim binary sonna-mcp is bundled with the desktop app. Point the client at that binary's absolute path:

{
  "mcpServers": {
    "sonna": {
      "command": "/Applications/Sonna.app/Contents/MacOS/sonna-mcp",
      "env": { "SONNA_CLIENT_ID": "claude-desktop" }
    }
  }
}

{
  "mcpServers": {
    "sonna": {
      "command": "C:\\Program Files\\Sonna\\sonna-mcp.exe",
      "env": { "SONNA_CLIENT_ID": "claude-desktop" }
    }
  }
}

{
  "mcpServers": {
    "sonna": {
      "command": "/opt/sonna/sonna-mcp",
      "env": { "SONNA_CLIENT_ID": "claude-desktop" }
    }
  }
}

The shim waits up to 30 seconds for the Sonna backend to come up, then proxies JSON-RPC from stdio over Streamable HTTP. Sonna must be running for the shim to connect.

Tools

Tool	Use
`sonna.speak`	Speak text in a voice profile. Returns a `generation_id` to poll.
`sonna.transcribe`	Whisper transcription of base64 audio or an absolute local path.
`sonna.list_captures`	Recent captures with transcripts, paginated.
`sonna.list_profiles`	Available voice profiles (cloned + preset).

`sonna.speak`

sonna.speak({
  text: "Deploy complete.",
  profile?: "Morgan",            // name or id; falls back to per-client binding, then default
  engine?: "qwen",               // qwen | qwen_custom_voice | luxtts | chatterbox | chatterbox_turbo | tada | kokoro
  personality?: true,            // rewrite via the profile's personality LLM before TTS; default comes from the per-client binding
  language?: "en",
})

Returns:

{
  "generation_id": "…",
  "status": "generating",
  "profile": "Morgan",
  "source": "mcp",
  "poll_url": "/generate/<id>/status"
}

Plain TTS — personality: false (or omitted + binding default is false). Text is spoken as-is.
Persona mode — personality: true and the profile must have a personality prompt set. The LLM rewrites the text in character before TTS. See Voice Personalities.

`sonna.transcribe`

sonna.transcribe({
  audio_base64?: "<base64>",    // exactly one of these two
  audio_path?: "/absolute/path/to/file.wav",
  language?: "en",
  model?: "turbo",              // base | small | medium | large | turbo
})

Returns { text, duration, language, model }. 200 MB ceiling on either path.

`sonna.list_captures`

{ limit?: 20, offset?: 0 } → { captures: [...], total }. limit is clamped to 1..=200.

`sonna.list_profiles`

No args → { profiles: [{ id, name, voice_type, language, has_personality }] }.

Voice resolution

Every call to sonna.speak (and POST /speak) resolves the voice profile in this order:

Passed as a name (case-insensitive) or id. If the name/id doesn't match, the call errors — the server doesn't silently fall back.

Looked up by the X-Sonna-Client-Id header. Managed in Sonna → Settings → MCP. Lets you pin Claude Code to Morgan, Cursor to Scarlett, etc.

capture_settings.default_playback_voice_id — same default voice the Captures tab's "Play as voice" action uses.

If none of the three produce a profile the tool returns a helpful error pointing at Settings.

Per-client bindings

Sonna → Settings → MCP shows one row per client_id Sonna has heard from, plus the config snippets you can copy into each agent. Each row carries:

Field	Purpose
`label`	Display name in the Settings UI (e.g. "Claude Code").
`profile_id`	The voice this client uses when `profile` isn't passed.
`default_engine`	Override the TTS engine for this client.
`default_personality`	When true, `sonna.speak` routes through the profile's personality LLM (rewrite) by default.
`last_seen_at`	Last time the server saw a request from this client.

last_seen_at is stamped automatically by middleware on every /mcp/* request — useful when you're not sure whether your config took.

The speaking pill

Every agent-initiated speak surfaces the floating pill the same way Dictation does, in a new Speaking state showing the profile name and an elapsed timer. The pill is intentionally unmissable — silent background TTS is a trust hazard, so Sonna always shows what's being spoken and in what voice.

Behind the scenes, the backend broadcasts speak-start and speak-end events on GET /events/speak, which DictateWindow subscribes to via SSE. The pill overrides the capture session when both would render — you can't hear two pills at once.

Non-MCP REST surface

POST /speak is a thin wrapper on the same code path for callers that don't speak MCP — shell scripts, ACP, A2A, GitHub Actions, whatever.

curl -X POST http://127.0.0.1:17493/speak \
  -H 'Content-Type: application/json' \
  -H 'X-Sonna-Client-Id: ci' \
  -d '{"text":"Build complete.","profile":"Morgan"}'

Body fields match the MCP tool: text, optional profile, engine, personality, language. Returns a GenerationResponse — the same shape as POST /generate.

Debugging

Use the MCP Inspector to poke tools directly without plumbing through an agent:

npx @modelcontextprotocol/inspector http://127.0.0.1:17493/mcp

Start with sonna.list_profiles to confirm wiring, then sonna.speak for end-to-end — you should hear audio and see the generation land in the Captures tab.

If an agent can't reach the server, the first thing to check is that Sonna is running — the backend only listens while the desktop app is open. The stdio shim surfaces this as a JSON-RPC error on the client side after its 30-second health-wait window elapses.

Security

Localhost only. The server binds to 127.0.0.1. If you ever point Sonna at a non-loopback interface (e.g. remote-mode over a trusted network), add a bearer token — it's on the roadmap but not in 0.5.0.
No auth today. Any process that can connect to your loopback can call MCP. That's the same trust boundary as the rest of Sonna's REST API and is appropriate for a single-user local tool.
audio_path reads are unrestricted against the same trust boundary. If you're scripting against a shared host, prefer audio_base64 so you don't have to think about path sandboxing.
Voice cloning consent applies. See Voice Cloning — an agent being able to call sonna.speak in someone's voice doesn't change the ethics of whose voices you clone.

Implementation notes

Transport: Streamable HTTP (Nov-2025 MCP spec, post-SSE). Claude Code, Cursor, Windsurf, and VS Code MCP extensions all support it.
Package naming: the backend package is backend/mcp_server/, not mcp, to avoid shadowing the PyPI mcp package FastMCP imports internally.
Dependencies: fastmcp>=3.0,<4.0, sse-starlette>=2.0.
Lifespan: mounting FastMCP requires the lifespan= kwarg on FastAPI() — the startup/shutdown event decorators are incompatible with FastMCP's Streamable HTTP session manager. The Sonna app.py composes both into one async context manager.

For the full developer-facing tour of the code layout, see backend/mcp_server/README.md in the repo.