Remote Mode
Connect to a GPU server for faster generation
Overview
Remote Mode allows you to run the Sonna backend on a separate machine (like a GPU server) while using the desktop app on your local machine.
Use Cases
- No local GPU - Use a cloud GPU or remote workstation
- Faster generation - Leverage powerful remote hardware
- Shared infrastructure - Multiple users connect to one server
- Laptop workflows - Keep your laptop cool and battery-efficient
Architecture
In Remote Mode, the Sonna desktop app (running on your local machine) communicates with the backend server (running on a remote machine) via HTTP. The local app provides only the user interface, while the remote server handles all the heavy processing including the TTS models, API endpoints, and audio generation.
Setting Up Remote Mode
On the Server
# Clone the repo
git clone https://github.com/jamiepine/sonna.git
cd sonna/backend
# Install Python dependencies
pip install -r requirements.txt
# Engines with incompatible transitive pins — install with --no-deps
pip install --no-deps chatterbox-tts
pip install --no-deps hume-tada
# Qwen3-TTS from source
pip install git+https://github.com/QwenLM/Qwen3-TTS.gitOr just run just setup from the repo root, which handles all of this.
# Allow external connections
uvicorn main:app --host 0.0.0.0 --port 17493This exposes the server to your network. Use a firewall or VPN for security.
# Ubuntu/Debian
sudo ufw allow 17493
# Or use your cloud provider's firewall settingsOn the Client
In Sonna, go to Settings → Server
Toggle Use Remote Server
http://<server-ip>:17493Replace <server-ip> with your server's IP address
Click Test Connection to verify
Cloud Deployment
AWS EC2
# Launch a GPU instance (e.g., g4dn.xlarge)
# Install dependencies
# Start server with --host 0.0.0.0Vast.ai
# Rent a GPU instance
# SSH in and clone repo
# Start serverRunPod
# Deploy a pod with CUDA support
# Install Sonna backend
# Expose port 17493Security Considerations
The API currently has no authentication. Only use on trusted networks or with a VPN.
Best Practices:
- Use a VPN (WireGuard, Tailscale) instead of exposing to the internet
- Run behind a reverse proxy with authentication (nginx + basic auth)
- Use HTTPS with SSL certificates
- Firewall rules to limit access to specific IPs
Performance
Expected performance on various GPUs:
| GPU | Generation Speed |
|---|---|
| RTX 4090 | ~2-3s per 10 words |
| RTX 3090 | ~3-4s per 10 words |
| RTX 3060 | ~5-7s per 10 words |
| CPU (12-core) | ~20-30s per 10 words |
A GPU with 8GB+ VRAM is recommended for best performance.
Troubleshooting
See the Troubleshooting Guide for common issues.