GitHub - pboechat/psittsa: An offline Text-To-Speech service you can host at home

An offline Text-To-Speech service you can host at home. Web UI + REST API + chrome-extension. Concurrency with cancel. MP3 outputs. Ephemeral temp storage + automatic audio cleanup.

What's inside

FastAPI server with a simple Web UI
Engines: Piper (primary) and pyttsx3 (secondary)
Configurable concurrency (limit parallel synth jobs)
Queue + cancel (cancel queued or running Piper jobs)
MP3 output (via ffmpeg)
HTTP Range support for /audio/{id}.mp3 (seek / progressive play)
HEAD /audio/{id}.mp3 readiness probe (used by clients to avoid downloading early)
Automatic TTL-based cleanup of generated audio
Ephemeral storage in system temp dir by default (override with PSITTSA_AUDIO_DIR)

Requirements

Python 3.9+
ffmpeg in PATH
For pyttsx3 on Linux: install espeak (sudo apt install espeak)
Optional GPU: install NVIDIA drivers and onnxruntime-gpu, then start the app with the --gpu flag.

Setup

Install the package (adds the psittsa-webapp console script):

python -m venv .venv
source .venv/bin/activate
pip install .
# GPU support (optional): pip install .[gpu]

Run it

Pick one of the following from the project root:

psittsa-webapp               # CPU
psittsa-webapp --gpu         # GPU (requires onnxruntime-gpu + drivers)

# or run the module directly
python -m psittsa.webapp

# or via uvicorn directly
uvicorn psittsa.webapp:app --host 0.0.0.0 --port 8000

Server uses host/port from config.json (default 0.0.0.0:8000).

Open the Web UI: http://localhost:8000/

Generated MP3s are written (by default) into a per-system temp folder such as /tmp/psittsa_audio and served at /audio/{id}.mp3.

To persist audio across restarts, set a directory explicitly (env var wins over defaults):

export PSITTSA_AUDIO_DIR=/var/lib/psittsa/audio
psittsa-webapp

Or in Docker:

docker run --rm -p 8000:8000 \
  -e PSITTSA_AUDIO_DIR=/app/audio \
  -v psittsa_audio:/app/audio \
  psittsa:cpu

TTL (default 3600s) is configurable; after expiration files are removed and completed jobs transition to expired.

Docker Usage

You can run psittsa in containers. Two build variants:

CPU-only (default Dockerfile)
GPU-enabled (Dockerfile.gpu, CUDA + onnxruntime-gpu)

For smaller production images (no build toolchain, no editable installs) use the multi-stage variants:

Production CPU (Dockerfile.prod)
Production GPU (Dockerfile.gpu.prod)

Build (CPU)

docker build -t psittsa:cpu .

Build (CPU - production image)

docker build -f Dockerfile.prod -t psittsa:cpu-prod .

Run (CPU)

docker run --rm -p 8000:8000 \
  -e PSITTSA_AUDIO_DIR=/app/audio \
  -v psittsa_audio:/app/audio \
  -v $(pwd)/voices:/app/psittsa/voices:ro \
  psittsa:cpu

Open: http://localhost:8000/

If you bundled voices inside the package and don't need an external voices mount:

docker run --rm -p 8000:8000 psittsa:cpu

Build (GPU)

Requires NVIDIA drivers + nvidia-container-toolkit.

docker build -f Dockerfile.gpu -t psittsa:gpu .

Build (GPU - production image)

docker build -f Dockerfile.gpu.prod -t psittsa:gpu-prod .

Run (GPU)

Use the --gpu flag when launching the container (the flag is part of the entrypoint script):

docker run --rm -p 8000:8000 \
  --gpus all \
  -e PSITTSA_AUDIO_DIR=/app/audio \
  -v psittsa_audio:/app/audio \
  -v $(pwd)/voices:/app/psittsa/voices:ro \
  -v $(pwd)/config.json:/app/psittsa/config.json:ro \
  psittsa:gpu

Multi-Arch Example (CPU image)

docker buildx build --platform linux/amd64,linux/arm64 -t youruser/psittsa:cpu --push .

Production Tips

Add --restart unless-stopped for long-running service.
Use a reverse proxy (Traefik / Nginx) to terminate TLS.
Persist data with the named volume (psittsa_data).
Mount voices/ read-only so updating models doesn’t require rebuild.
Prefer *-prod images in deployment (smaller, fewer packages).
Add a health check (e.g., HEALTHCHECK CMD curl -f http://localhost:8000/api/tts || exit 1) if you introduce a lightweight status endpoint.

Configuration

Edit config.json in the repo root:

{
  "host": "0.0.0.0",
  "port": 8000,
  "concurrency": 2,
  "piper_binary": "bin/piper/linux_x86_64/piper",
  "ffmpeg_path": "ffmpeg",
  "audio_ttl_seconds": 3600,
  "audio_clean_interval_seconds": 120
}

Notes:

piper_binary is resolved relative to the installed package (bundled files) first, then the repo root.
Voices are discovered dynamically by scanning voices/ for .onnx files (recursively). If an adjacent .onnx.json exists, its metadata may be used to improve the display name.
Launch with --gpu (and install onnxruntime-gpu) to enable Piper --cuda.
audio_ttl_seconds: how long (seconds) to keep synthesized MP3s before deleting.
audio_clean_interval_seconds: frequency of the cleanup sweep (defaults to a fraction of TTL if omitted).
Set PSITTSA_AUDIO_DIR env to override temp storage and persist audio.

REST API

-- POST /api/tts

Body: { "text": "Hello", "engine": "piper", "voice": "<voice-id>" }
- voice is optional and only applies to piper. If omitted, the first available voice is used.
Response: job record { id, engine, voice, voice_name, status, result, error }
GET /api/tts → list jobs
GET /api/tts/{id} → job status
DELETE /api/tts/{id} → cancel job (Piper is terminated; pyttsx3 cancel is best‑effort)
GET /audio/{id}.mp3 → MP3 file when completed
- Supports Range: bytes=... for partial content (single range primary; multi-range probes fallback to first).
- HEAD /audio/{id}.mp3 readiness (404 until present). After cleanup, completed jobs become expired (404).
GET /api/voices/piper → list available Piper voices: { voices: [{ id, name }] }

Troubleshooting

Piper not found: make sure the binary exists and is executable; update config.json if you moved it.
No voices found: place Piper voice .onnx files under voices/ (you can organize in subfolders). Optionally include the matching .onnx.json next to each model.
ffmpeg error: install ffmpeg and ensure it's in PATH.
pyttsx3 no audio: install espeak on Linux.
Cancel: Piper jobs cancel immediately; pyttsx3 cancel is best‑effort.
Audio disappeared: likely expired due to TTL. Increase audio_ttl_seconds or set persistent PSITTSA_AUDIO_DIR.
416 Range errors: malformed or multi-range; server now falls back gracefully when possible.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
chrome-extension		chrome-extension
psittsa		psittsa
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.gpu		Dockerfile.gpu
Dockerfile.gpu.prod		Dockerfile.gpu.prod
Dockerfile.prod		Dockerfile.prod
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What's inside

Requirements

Setup

Run it

Docker Usage

Build (CPU)

Build (CPU - production image)

Run (CPU)

Build (GPU)

Build (GPU - production image)

Run (GPU)

Multi-Arch Example (CPU image)

Production Tips

Configuration

REST API

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

What's inside

Requirements

Setup

Run it

Docker Usage

Build (CPU)

Build (CPU - production image)

Run (CPU)

Build (GPU)

Build (GPU - production image)

Run (GPU)

Multi-Arch Example (CPU image)

Production Tips

Configuration

REST API

Troubleshooting

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages