Sonna
Overview

Remote Mode

Connect to a GPU server for faster generation

Overview

Remote Mode allows you to run the Sonna backend on a separate machine (like a GPU server) while using the desktop app on your local machine.

Use Cases

  • No local GPU - Use a cloud GPU or remote workstation
  • Faster generation - Leverage powerful remote hardware
  • Shared infrastructure - Multiple users connect to one server
  • Laptop workflows - Keep your laptop cool and battery-efficient

Architecture

In Remote Mode, the Sonna desktop app (running on your local machine) communicates with the backend server (running on a remote machine) via HTTP. The local app provides only the user interface, while the remote server handles all the heavy processing including the TTS models, API endpoints, and audio generation.

Setting Up Remote Mode

On the Server

# Clone the repo
git clone https://github.com/jamiepine/sonna.git
cd sonna/backend

# Install Python dependencies
pip install -r requirements.txt

# Engines with incompatible transitive pins — install with --no-deps
pip install --no-deps chatterbox-tts
pip install --no-deps hume-tada

# Qwen3-TTS from source
pip install git+https://github.com/QwenLM/Qwen3-TTS.git

Or just run just setup from the repo root, which handles all of this.

# Allow external connections
uvicorn main:app --host 0.0.0.0 --port 17493

This exposes the server to your network. Use a firewall or VPN for security.

# Ubuntu/Debian
sudo ufw allow 17493

# Or use your cloud provider's firewall settings

On the Client

In Sonna, go to Settings → Server

Toggle Use Remote Server

http://<server-ip>:17493

Replace <server-ip> with your server's IP address

Click Test Connection to verify

Cloud Deployment

AWS EC2

# Launch a GPU instance (e.g., g4dn.xlarge)
# Install dependencies
# Start server with --host 0.0.0.0

Vast.ai

# Rent a GPU instance
# SSH in and clone repo
# Start server

RunPod

# Deploy a pod with CUDA support
# Install Sonna backend
# Expose port 17493

Security Considerations

The API currently has no authentication. Only use on trusted networks or with a VPN.

Best Practices:

  • Use a VPN (WireGuard, Tailscale) instead of exposing to the internet
  • Run behind a reverse proxy with authentication (nginx + basic auth)
  • Use HTTPS with SSL certificates
  • Firewall rules to limit access to specific IPs

Performance

Expected performance on various GPUs:

GPUGeneration Speed
RTX 4090~2-3s per 10 words
RTX 3090~3-4s per 10 words
RTX 3060~5-7s per 10 words
CPU (12-core)~20-30s per 10 words

A GPU with 8GB+ VRAM is recommended for best performance.

Troubleshooting

See the Troubleshooting Guide for common issues.

On this page