Docker Deployment
Run Sonna as a headless server with a web UI using Docker
Overview
Sonna can run as a Docker container with a full web UI -- no desktop app required. This is ideal for headless servers, shared GPU machines, or self-hosted deployments.
Quick Start
git clone https://github.com/jamiepine/sonna.git
cd sonna
docker compose upOpen http://localhost:17493 in your browser. The full Sonna UI is served directly from the backend.
The first build takes a few minutes (compiling the frontend, installing Python dependencies). Subsequent starts are fast thanks to Docker layer caching.
How It Works
The Docker image uses a 3-stage build:
- Frontend -- builds the React SPA with Bun and Vite
- Backend -- installs Python dependencies and TTS model packages
- Runtime -- combines both into a minimal image running the FastAPI server
The backend serves the web UI automatically when the built frontend is present. All API routes work exactly as they do in the desktop app.
Configuration
docker-compose.yml
The default docker-compose.yml binds to localhost only, mounts persistent volumes for data and model cache, and sets sensible resource limits:
services:
sonna:
build: .
container_name: sonna
restart: unless-stopped
ports:
- "127.0.0.1:17493:17493"
volumes:
- ./output:/app/data/generations
- sonna-data:/app/data
- huggingface-cache:/home/sonna/.cache/huggingface
environment:
- LOG_LEVEL=info
deploy:
resources:
limits:
cpus: '4'
memory: 8GExposing to Your Network
By default the container only listens on 127.0.0.1. To allow other machines on your network to connect, change the port binding:
ports:
- "0.0.0.0:17493:17493"The API has no built-in authentication. Only expose to trusted networks, or put a reverse proxy with auth in front of it.
Environment Variables
| Variable | Default | Description |
|---|---|---|
LOG_LEVEL | info | Logging verbosity (debug, info, warning, error) |
SONNA_MODELS_DIR | (HuggingFace cache) | Custom path for model storage |
SONNA_CORS_ORIGINS | (local origins) | Additional CORS origins, comma-separated |
Resource Limits
The default compose file limits the container to 4 CPUs and 8GB RAM. Adjust these based on your hardware:
deploy:
resources:
limits:
cpus: '8'
memory: 16GTTS model inference is memory-intensive. 8GB is the minimum for running a single engine. 16GB+ is recommended if you want multiple engines loaded simultaneously.
Volumes
| Volume | Container Path | Purpose |
|---|---|---|
./output | /app/data/generations | Generated audio files (bind-mount, easy access from host) |
sonna-data | /app/data | Profiles, database, cache |
huggingface-cache | /home/sonna/.cache/huggingface | Downloaded models (persists across rebuilds) |
The huggingface-cache volume is important -- without it, models would be re-downloaded every time the container is rebuilt.
GPU Acceleration
NVIDIA GPU (CUDA)
To use your NVIDIA GPU inside the container, install the NVIDIA Container Toolkit and add GPU access to your compose file:
services:
sonna:
build: .
# ... existing config ...
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]AMD GPU (ROCm)
For AMD GPUs, use the ROCm runtime:
services:
sonna:
build: .
# ... existing config ...
devices:
- /dev/kfd
- /dev/dri
group_add:
- videoCPU Only
The default configuration runs on CPU. This works fine but generation will be slower. LuxTTS is the fastest engine on CPU (150x realtime).
Security
The Docker image follows security best practices:
- Non-root user -- the server runs as
sonna, notroot - Localhost binding -- only accessible from the host machine by default
- Health checks -- automatic restart if the server hangs (
/healthendpoint polled every 30s) - CORS restricted -- only local origins allowed by default
Running Behind a Reverse Proxy
For production deployments, put Sonna behind nginx or Caddy with TLS and authentication:
server {
listen 443 ssl;
server_name sonna.example.com;
ssl_certificate /etc/ssl/certs/sonna.pem;
ssl_certificate_key /etc/ssl/private/sonna.key;
auth_basic "Sonna";
auth_basic_user_file /etc/nginx/.htpasswd;
location / {
proxy_pass http://127.0.0.1:17493;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}Troubleshooting
Container starts but UI shows JSON
If you see {"message": "sonna API", ...} instead of the web UI, the frontend build may have failed during the Docker build. Check the build logs:
docker compose build --no-cacheLook for errors in the "Build frontend" stage.
Models downloading on every restart
Make sure the huggingface-cache volume is configured. Without it, the model cache is lost when the container stops:
volumes:
- huggingface-cache:/home/sonna/.cache/huggingfaceOut of memory
TTS models are large. If the container is killed by the OOM killer, increase the memory limit:
deploy:
resources:
limits:
memory: 16GPort already in use
# Check what's using port 17493
lsof -i :17493
# Or use a different port
ports:
- "127.0.0.1:8080:17493"Prebuilt Images (Coming Soon)
We plan to publish prebuilt Docker images to GitHub Container Registry so you won't need to build locally:
# Not available yet — coming in a future release
docker run -p 17493:17493 ghcr.io/jamiepine/sonna:latestThe CPU image will be ~3-4 GB (Python + PyTorch + TTS packages). A separate CUDA tag (~6-8 GB) will be available for NVIDIA GPU users. This is normal for ML containers.
For now, use docker compose up to build from source as described above.
Connecting the Desktop App
You can also use the desktop app as a frontend for a Docker-hosted backend. In the desktop app, go to Settings -> Server, enable Remote Mode, and enter http://<server-ip>:17493.
See the Remote Mode guide for details.