Sonna
Overview

Preset Voices

Use built-in, ready-made voices without recording audio samples

Overview

Some Sonna engines ship with a curated set of pre-built voices. Instead of cloning from your own audio sample, you pick a voice from a fixed catalog and the model speaks in that voice. No recording, no upload, no per-voice training required.

Two engines in 0.4 ship preset voices:

EngineVoicesLanguagesStrengths
Kokoro 82M509Tiny model, CPU-friendly, lowest VRAM of any engine
Qwen CustomVoice9 (premium curated)4Natural-language style control over tone, emotion, pace

Looking for cloning a specific person's voice instead? See Voice Cloning.

When to Use Preset Voices

No reference audio

You don't have (or don't want to provide) a recording of the target voice

Production reliability

Curated voices have predictable quality across any text input

Speed

Skip the audio cleanup, sample preparation, and quality iteration loop

Lightweight setup

Kokoro runs at CPU realtime with ~150 MB on disk — no GPU needed

Creating a Preset-Voice Profile

Same entry point as cloning profiles

Select Kokoro or Qwen CustomVoice from the engine dropdown

The voice catalog for the chosen engine appears — preview each by clicking it

Give the profile a name. No audio sample needed — just save

Use the profile like any other in the floating generate box or the Generate page

Preset profiles are locked to their source engine — switching engines won't work since the voice exists only for that model. The profile grid greys out preset profiles when you switch to a different engine, and clicking one auto-switches the engine back to the right one.

Kokoro 82M — 50 Voices Across 9 Languages

Kokoro is the smallest engine in Sonna at 82M parameters. It runs at CPU realtime with negligible VRAM, making it the best option for lightweight local inference. Voices are pre-built style vectors trained into the model — there's no concept of cloning here.

Repository: hexgrad/Kokoro-82M · Apache 2.0 licensed

American English

FemaleMale
AlloyAdam
AoedeEcho
BellaEric
HeartFenrir
JessicaLiam
KoreMichael
NicoleOnyx
NovaPuck
RiverSanta
Sarah
Sky

British English

FemaleMale
AliceDaniel
EmmaFable
IsabellaGeorge
LilyLewis

Other Languages

LanguageVoices
Spanish (es)Dora (f), Alex (m), Santa (m)
French (fr)Siwis (f)
Hindi (hi)Alpha (f), Beta (f), Omega (m), Psi (m)
Italian (it)Sara (f), Nicola (m)
Japanese (ja)Alpha (f), Gongitsune (f), Nezumi (f), Tebukuro (f), Kumo (m)
Portuguese (pt)Dora (f), Alex (m), Santa (m)
Chinese (zh)Xiaobei (f), Xiaoni (f), Xiaoxiao (f), Xiaoyi (f)

Kokoro at a Glance

PropertyValue
Parameters82M
Sample rate24 kHz
VRAM~150 MB (negligible on CPU)
SpeedRealtime on CPU, faster on GPU
InstructNot supported (preset voice carries the style)
LicenseApache 2.0

Qwen CustomVoice — 9 Premium Voices with Instruct Control

Qwen CustomVoice ships with 9 curated speakers and supports natural-language style control — you tell the model how to deliver the line ("speak slowly with warmth", "authoritative and clear") and it adapts tone, emotion, and pace.

Two model sizes:

  • 1.7B — full quality, recommended default
  • 0.6B — lighter, faster, lower-end hardware

Repository: Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice (and 0.6B variant) · by Alibaba

Voice Catalog

SpeakerGenderLanguageDescription
VivianfemaleChineseBright, slightly edgy young female voice
SerenafemaleChineseWarm, gentle young female voice
Uncle FumaleChineseSeasoned male voice with a low, mellow timbre
DylanmaleChineseYouthful Beijing male voice with a clear, natural timbre
EricmaleChineseLively Chengdu male voice with a slightly husky brightness
RyanmaleEnglishDynamic male voice with strong rhythmic drive (default)
AidenmaleEnglishSunny American male voice with a clear midrange
Ono AnnafemaleJapanesePlayful Japanese female voice with a light, nimble timbre
SoheefemaleKoreanWarm Korean female voice with rich emotion

Using Instruct Mode

In the floating generate box, switch to a Qwen CustomVoice profile and click the delivery instructions toggle (slider icon, left of the generate button). A second textarea appears below the main text:

  • Main text → what you want the voice to say
  • Instruct text → how you want it delivered

Examples of effective instruct prompts:

Speak slowly with emphasis, like reading bedtime stories
Warm and friendly, conversational tone
Professional and authoritative, broadcast quality
Whisper, intimate and close
Excited and energetic, like sports commentary

The full Generate page also surfaces the instruct field as a separate input.

Qwen CustomVoice at a Glance

PropertyValue
Parameters1.7B / 0.6B
LanguagesChinese, English, Japanese, Korean (10 supported)
Voices9 curated preset speakers
VRAM~3.5 GB (1.7B), ~1.2 GB (0.6B)
InstructYes — natural-language style control
CloningNo — paired Base Qwen3-TTS engine handles cloning

Cloning vs Preset — Quick Decision

You want…Use
To replicate a specific person's voiceVoice Cloning
Production-ready voices with no audio prepKokoro or Qwen CustomVoice
The smallest possible footprint (CPU-only)Kokoro
Fine control over delivery (tone, pace, emotion)Qwen CustomVoice
The broadest language coverageVoice Cloning via Chatterbox Multilingual (23 langs)

Limitations

Preset voices are fixed — you can't fine-tune or modify the underlying voice. If you want a specific voice that isn't in the catalog, use a cloning engine and provide a reference sample.

  • Preset voices can't be exported to use in other Sonna installations as audio (only as profile metadata pointing to the same engine + voice ID)
  • The Kokoro voice catalog is set by the upstream model — new voices appear only when hexgrad publishes new model releases
  • Qwen CustomVoice's 9 speakers are part of the model checkpoint — same constraint

Next Steps

On this page