Skip to content

pboechat/psittsa

Repository files navigation

psiTTSa

An offline Text-To-Speech service you can host at home. Web UI + REST API + chrome-extension. Concurrency with cancel. MP3 outputs. Ephemeral temp storage + automatic audio cleanup.

What's inside

  • FastAPI server with a simple Web UI
  • Engines: Piper (primary) and pyttsx3 (secondary)
  • Configurable concurrency (limit parallel synth jobs)
  • Queue + cancel (cancel queued or running Piper jobs)
  • MP3 output (via ffmpeg)
  • HTTP Range support for /audio/{id}.mp3 (seek / progressive play)
  • HEAD /audio/{id}.mp3 readiness probe (used by clients to avoid downloading early)
  • Automatic TTL-based cleanup of generated audio
  • Ephemeral storage in system temp dir by default (override with PSITTSA_AUDIO_DIR)

Requirements

  • Python 3.9+
  • ffmpeg in PATH
  • For pyttsx3 on Linux: install espeak (sudo apt install espeak)
  • Optional GPU: install NVIDIA drivers and onnxruntime-gpu, then start the app with the --gpu flag.

Setup

Install the package (adds the psittsa-webapp console script):

python -m venv .venv
source .venv/bin/activate
pip install .
# GPU support (optional): pip install .[gpu]

Run it

Pick one of the following from the project root:

psittsa-webapp               # CPU
psittsa-webapp --gpu         # GPU (requires onnxruntime-gpu + drivers)

# or run the module directly
python -m psittsa.webapp

# or via uvicorn directly
uvicorn psittsa.webapp:app --host 0.0.0.0 --port 8000

Server uses host/port from config.json (default 0.0.0.0:8000).

Open the Web UI: http://localhost:8000/

Generated MP3s are written (by default) into a per-system temp folder such as /tmp/psittsa_audio and served at /audio/{id}.mp3.

To persist audio across restarts, set a directory explicitly (env var wins over defaults):

export PSITTSA_AUDIO_DIR=/var/lib/psittsa/audio
psittsa-webapp

Or in Docker:

docker run --rm -p 8000:8000 \
  -e PSITTSA_AUDIO_DIR=/app/audio \
  -v psittsa_audio:/app/audio \
  psittsa:cpu

TTL (default 3600s) is configurable; after expiration files are removed and completed jobs transition to expired.

Docker Usage

You can run psittsa in containers. Two build variants:

  1. CPU-only (default Dockerfile)
  2. GPU-enabled (Dockerfile.gpu, CUDA + onnxruntime-gpu)

For smaller production images (no build toolchain, no editable installs) use the multi-stage variants:

  1. Production CPU (Dockerfile.prod)
  2. Production GPU (Dockerfile.gpu.prod)

Build (CPU)

docker build -t psittsa:cpu .

Build (CPU - production image)

docker build -f Dockerfile.prod -t psittsa:cpu-prod .

Run (CPU)

docker run --rm -p 8000:8000 \
  -e PSITTSA_AUDIO_DIR=/app/audio \
  -v psittsa_audio:/app/audio \
  -v $(pwd)/voices:/app/psittsa/voices:ro \
  psittsa:cpu

Open: http://localhost:8000/

If you bundled voices inside the package and don't need an external voices mount:

docker run --rm -p 8000:8000 psittsa:cpu

Build (GPU)

Requires NVIDIA drivers + nvidia-container-toolkit.

docker build -f Dockerfile.gpu -t psittsa:gpu .

Build (GPU - production image)

docker build -f Dockerfile.gpu.prod -t psittsa:gpu-prod .

Run (GPU)

Use the --gpu flag when launching the container (the flag is part of the entrypoint script):

docker run --rm -p 8000:8000 \
  --gpus all \
  -e PSITTSA_AUDIO_DIR=/app/audio \
  -v psittsa_audio:/app/audio \
  -v $(pwd)/voices:/app/psittsa/voices:ro \
  -v $(pwd)/config.json:/app/psittsa/config.json:ro \
  psittsa:gpu

Multi-Arch Example (CPU image)

docker buildx build --platform linux/amd64,linux/arm64 -t youruser/psittsa:cpu --push .

Production Tips

  • Add --restart unless-stopped for long-running service.
  • Use a reverse proxy (Traefik / Nginx) to terminate TLS.
  • Persist data with the named volume (psittsa_data).
  • Mount voices/ read-only so updating models doesn’t require rebuild.
  • Prefer *-prod images in deployment (smaller, fewer packages).
  • Add a health check (e.g., HEALTHCHECK CMD curl -f http://localhost:8000/api/tts || exit 1) if you introduce a lightweight status endpoint.

Configuration

Edit config.json in the repo root:

{
  "host": "0.0.0.0",
  "port": 8000,
  "concurrency": 2,
  "piper_binary": "bin/piper/linux_x86_64/piper",
  "ffmpeg_path": "ffmpeg",
  "audio_ttl_seconds": 3600,
  "audio_clean_interval_seconds": 120
}

Notes:

  • piper_binary is resolved relative to the installed package (bundled files) first, then the repo root.
  • Voices are discovered dynamically by scanning voices/ for .onnx files (recursively). If an adjacent .onnx.json exists, its metadata may be used to improve the display name.
  • Launch with --gpu (and install onnxruntime-gpu) to enable Piper --cuda.
  • audio_ttl_seconds: how long (seconds) to keep synthesized MP3s before deleting.
  • audio_clean_interval_seconds: frequency of the cleanup sweep (defaults to a fraction of TTL if omitted).
  • Set PSITTSA_AUDIO_DIR env to override temp storage and persist audio.

REST API

-- POST /api/tts

  • Body: { "text": "Hello", "engine": "piper", "voice": "<voice-id>" }

    • voice is optional and only applies to piper. If omitted, the first available voice is used.
  • Response: job record { id, engine, voice, voice_name, status, result, error }

  • GET /api/tts → list jobs

  • GET /api/tts/{id} → job status

  • DELETE /api/tts/{id} → cancel job (Piper is terminated; pyttsx3 cancel is best‑effort)

  • GET /audio/{id}.mp3 → MP3 file when completed

    • Supports Range: bytes=... for partial content (single range primary; multi-range probes fallback to first).
    • HEAD /audio/{id}.mp3 readiness (404 until present). After cleanup, completed jobs become expired (404).
  • GET /api/voices/piper → list available Piper voices: { voices: [{ id, name }] }

Troubleshooting

  • Piper not found: make sure the binary exists and is executable; update config.json if you moved it.
  • No voices found: place Piper voice .onnx files under voices/ (you can organize in subfolders). Optionally include the matching .onnx.json next to each model.
  • ffmpeg error: install ffmpeg and ensure it's in PATH.
  • pyttsx3 no audio: install espeak on Linux.
  • Cancel: Piper jobs cancel immediately; pyttsx3 cancel is best‑effort.
  • Audio disappeared: likely expired due to TTL. Increase audio_ttl_seconds or set persistent PSITTSA_AUDIO_DIR.
  • 416 Range errors: malformed or multi-range; server now falls back gracefully when possible.

About

An offline Text-To-Speech service you can host at home

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors