Sonna auto-detects available accelerators on first launch and picks the fastest backend it can use. For most people this just works — open the app and you're already on the right backend.
This page is for the cases where it doesn't:
You have a GPU but Sonna is running on CPU
You upgraded GPUs (especially to RTX 50-series / Blackwell) and generation broke
You want to switch backends manually (e.g. force MLX over PyTorch on Apple Silicon)
You see [UNSUPPORTED - see logs] next to your GPU in Settings
On M-series Macs, Sonna ships an MLX-optimized backend that uses the Apple Neural Engine. It's 4-5x faster than the PyTorch (CPU/Metal) path for supported engines.
Engine
MLX support
Notes
Qwen3-TTS
✅ Native
Uses MLX exclusively when available
Chatterbox / Turbo
PyTorch MPS
Falls back to Metal via PyTorch
LuxTTS
PyTorch MPS
TADA
PyTorch MPS
Kokoro
PyTorch MPS
Requires PYTORCH_ENABLE_MPS_FALLBACK=1
Qwen CustomVoice
PyTorch MPS
Whisper (transcribe)
✅ Native
MLX-Whisper is the default on Apple Silicon
The Whisper Turbo + MLX combo dropped transcription latency from ~20s to ~2-3s on M-series chips (see CHANGELOG entry for v0.1.10).
Sonna doesn't bundle CUDA into the main installer (it would balloon downloads to multi-gigabyte territory for users who don't have an NVIDIA GPU). Instead, when you first need it, the app downloads a separate CUDA backend binary that contains the PyTorch + CUDA runtime.
If an NVIDIA GPU is detected, you'll see "Install CUDA backend" in the GPU panel
The app downloads two archives separately:
Server core (~200-400 MB) — versioned with each Sonna release
CUDA libs (~4 GB) — the heavy PyTorch + CUDA DLLs, versioned independently
Sonna restarts to swap in the CUDA backend
The split-archive design (added in v0.4) means most Sonna upgrades only redownload the small server-core archive. The 4 GB libs archive is only refreshed when the underlying CUDA toolkit or torch major version changes.
When a new Sonna release ships, the GPU panel checks if the bundled server-core matches the installed CUDA version. If only the core changed (typical), it pulls the new core in the background. If the libs version changed (rare — only happens on cu126 → cu128 type bumps), you'll be prompted to confirm the larger download.
Sonna auto-detects Arc GPUs and routes through Intel's PyTorch XPU backend (powered by IPEX — Intel Extension for PyTorch). No extra installation step beyond the standard Sonna install.
Verify it's working:
Settings → GPU should show XPU followed by your Arc model name (e.g. XPU (Intel Arc A770))
The fallback for Windows users with non-NVIDIA, non-Intel-Arc GPUs (older AMD discrete, integrated GPUs, etc.). Slower than CUDA and XPU but provides some acceleration over CPU.
Auto-selected when no other GPU backend is available.
Sonna 0.4 added a runtime check that compares your GPU's compute capability against the architectures the bundled PyTorch was compiled for. If they don't match, you'll see:
A startup log line: WARNING: GPU COMPATIBILITY: <your GPU> is not supported by this PyTorch build...
The GPU label in Settings shows [UNSUPPORTED - see logs]
The /health API returns a populated gpu_compatibility_warning field
On NVIDIA: install the CUDA backend (Settings → GPU)
On Intel Arc: confirm IPEX detection in startup logs; restart the app after a driver update
On AMD Linux: check HSA_OVERRIDE_GFX_VERSION is set
▶'no kernel image is available' / 'CUDA error'
Almost always means the bundled PyTorch doesn't have kernels for your GPU's compute capability.
Update to Sonna ≥ 0.4.0 (Blackwell support added there)
Reinstall the CUDA backend
If still broken, install PyTorch nightly via Remote Mode
▶Out of memory (CUDA)
Switch to a smaller model size (e.g. Qwen3 0.6B instead of 1.7B)
Use Settings → Models to unload other engines you're not using
Enable low_cpu_mem_usage is already on for CPU; for CUDA, the engine's device_map handles offload automatically
Close other GPU applications
▶MPS fallback errors on macOS
Some operations don't have a Metal implementation. Sonna sets PYTORCH_ENABLE_MPS_FALLBACK=1 for engines that need it (notably Kokoro), but if you launch from a custom env, set it manually:
export PYTORCH_ENABLE_MPS_FALLBACK=1
▶Generation works but is slow on my GPU
Check Settings → GPU shows your GPU (not CPU)
Check VRAM usage — you may be paging to system memory
Try a smaller model
For NVIDIA: confirm cu128 is installed (Settings → GPU → version)