Skip to content

Spadav/supersonic

Repository files navigation

SuperSonic WebUI

SuperSonic WebUI is a local text-to-speech gateway and browser interface for Supertone's Supertonic TTS engine. It exposes a mostly OpenAI-compatible /v1/audio/speech endpoint, maps OpenAI-style voice names to Supertonic voices, and includes a compact WebUI for generating speech, changing language, selecting output format, and using simulated streaming playback.

The app runs locally with FastAPI and serves both the API and static WebUI from the same process.

Features

  • Local Supertonic TTS through supertonic>=1.2.0
  • OpenAI-style speech endpoint: POST /v1/audio/speech
  • WebUI at /
  • Voice aliases: alloy, echo, fable, nova, onyx, shimmer
  • Native Supertonic voices: M1-M5, F1-F5
  • Output formats: wav, mp3, opus, flac, pcm
  • Simulated streaming endpoint: POST /v1/audio/stream
  • 31 language options
  • Text normalization for numbers, currencies, units, and phone numbers
  • Custom voice-style JSON upload support

Requirements

  • Python 3.10+ recommended
  • ffmpeg for MP3, Opus, and FLAC output
  • The first Supertonic model load may download model assets into ~/.cache/supertonic3

Local Setup

python3 -m venv .venv
.venv/bin/python -m pip install -r requirements.txt
.venv/bin/python server.py

Open:

http://127.0.0.1:8880

Health check:

curl http://127.0.0.1:8880/health

Docker

Build and run with Docker Compose:

docker compose up -d --build

Open:

http://127.0.0.1:8880

The compose file keeps the Supertonic model cache in a Docker volume so the model is not downloaded every time:

supertonic-cache:/root/.cache

Custom uploaded voices are persisted through:

./voices:/app/voices

Stop the container:

docker compose down

Remove the model cache volume as well:

docker compose down -v

API Usage

Generate WAV audio:

curl -o speech.wav \
  -X POST http://127.0.0.1:8880/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "Hello from SuperSonic.",
    "voice": "alloy",
    "response_format": "wav",
    "speed": 1.0,
    "language": "en",
    "steps": 8
  }'

List voices and languages:

curl http://127.0.0.1:8880/v1/audio/voices

List models:

curl http://127.0.0.1:8880/v1/models

Stream simulated PCM frames over Server-Sent Events:

curl -N \
  -X POST "http://127.0.0.1:8880/v1/audio/stream?chunk_size=300" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "This is a simulated streaming test.",
    "voice": "echo",
    "language": "en",
    "speed": 1.0,
    "steps": 8
  }'

OpenAI Compatibility

Compatible for basic text-to-speech clients that call:

POST /v1/audio/speech

Supported OpenAI-style fields:

  • model
  • input
  • voice
  • response_format
  • speed

SuperSonic-specific fields:

  • language
  • steps

Important differences:

  • model is accepted but the app always uses Supertonic.
  • Authentication is not required or validated.
  • Error responses are FastAPI-style, not exact OpenAI error objects.
  • Streaming uses a custom /v1/audio/stream SSE format.
  • Transcription and translation endpoints are not implemented.

Configuration

Environment variables:

Variable Default Description
SUPERSONIC_HOST 0.0.0.0 Bind address
SUPERSONIC_PORT 8880 HTTP port
SUPERSONIC_LOG_LEVEL INFO Python logging level

Custom Voices

Upload a Supertonic voice-style JSON file from the WebUI, or call:

curl -X POST http://127.0.0.1:8880/v1/audio/voices/upload \
  -F "file=@voice.json"

Uploaded voices are stored in voices/ and can be used by name in voice.

Notes

  • WAV and PCM do not require ffmpeg.
  • MP3, Opus, and FLAC require ffmpeg.
  • The streaming endpoint is simulated streaming: Supertonic synthesizes text chunks, then the server slices generated PCM into small frames for browser playback.
  • CPU inference works, but generation speed depends on machine performance.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors