An offline Text-To-Speech service you can host at home. Web UI + REST API + chrome-extension. Concurrency with cancel. MP3 outputs. Ephemeral temp storage + automatic audio cleanup.
- FastAPI server with a simple Web UI
- Engines:
Piper(primary) andpyttsx3(secondary) - Configurable concurrency (limit parallel synth jobs)
- Queue + cancel (cancel queued or running
Piperjobs) - MP3 output (via
ffmpeg) - HTTP Range support for
/audio/{id}.mp3(seek / progressive play) HEAD /audio/{id}.mp3readiness probe (used by clients to avoid downloading early)- Automatic TTL-based cleanup of generated audio
- Ephemeral storage in system temp dir by default (override with
PSITTSA_AUDIO_DIR)
- Python 3.9+
ffmpeginPATH- For
pyttsx3on Linux: install espeak (sudo apt install espeak) - Optional GPU: install NVIDIA drivers and
onnxruntime-gpu, then start the app with the--gpuflag.
Install the package (adds the psittsa-webapp console script):
python -m venv .venv
source .venv/bin/activate
pip install .
# GPU support (optional): pip install .[gpu]Pick one of the following from the project root:
psittsa-webapp # CPU
psittsa-webapp --gpu # GPU (requires onnxruntime-gpu + drivers)
# or run the module directly
python -m psittsa.webapp
# or via uvicorn directly
uvicorn psittsa.webapp:app --host 0.0.0.0 --port 8000Server uses host/port from config.json (default 0.0.0.0:8000).
Open the Web UI: http://localhost:8000/
Generated MP3s are written (by default) into a per-system temp folder such as /tmp/psittsa_audio and served at /audio/{id}.mp3.
To persist audio across restarts, set a directory explicitly (env var wins over defaults):
export PSITTSA_AUDIO_DIR=/var/lib/psittsa/audio
psittsa-webappOr in Docker:
docker run --rm -p 8000:8000 \
-e PSITTSA_AUDIO_DIR=/app/audio \
-v psittsa_audio:/app/audio \
psittsa:cpuTTL (default 3600s) is configurable; after expiration files are removed and completed jobs transition to expired.
You can run psittsa in containers. Two build variants:
- CPU-only (default
Dockerfile) - GPU-enabled (
Dockerfile.gpu, CUDA + onnxruntime-gpu)
For smaller production images (no build toolchain, no editable installs) use the multi-stage variants:
- Production CPU (
Dockerfile.prod) - Production GPU (
Dockerfile.gpu.prod)
docker build -t psittsa:cpu .docker build -f Dockerfile.prod -t psittsa:cpu-prod .docker run --rm -p 8000:8000 \
-e PSITTSA_AUDIO_DIR=/app/audio \
-v psittsa_audio:/app/audio \
-v $(pwd)/voices:/app/psittsa/voices:ro \
psittsa:cpuOpen: http://localhost:8000/
If you bundled voices inside the package and don't need an external voices mount:
docker run --rm -p 8000:8000 psittsa:cpuRequires NVIDIA drivers + nvidia-container-toolkit.
docker build -f Dockerfile.gpu -t psittsa:gpu .docker build -f Dockerfile.gpu.prod -t psittsa:gpu-prod .Use the --gpu flag when launching the container (the flag is part of the entrypoint script):
docker run --rm -p 8000:8000 \
--gpus all \
-e PSITTSA_AUDIO_DIR=/app/audio \
-v psittsa_audio:/app/audio \
-v $(pwd)/voices:/app/psittsa/voices:ro \
-v $(pwd)/config.json:/app/psittsa/config.json:ro \
psittsa:gpudocker buildx build --platform linux/amd64,linux/arm64 -t youruser/psittsa:cpu --push .- Add
--restart unless-stoppedfor long-running service. - Use a reverse proxy (Traefik / Nginx) to terminate TLS.
- Persist data with the named volume (
psittsa_data). - Mount
voices/read-only so updating models doesn’t require rebuild. - Prefer
*-prodimages in deployment (smaller, fewer packages). - Add a health check (e.g.,
HEALTHCHECK CMD curl -f http://localhost:8000/api/tts || exit 1) if you introduce a lightweight status endpoint.
Edit config.json in the repo root:
{
"host": "0.0.0.0",
"port": 8000,
"concurrency": 2,
"piper_binary": "bin/piper/linux_x86_64/piper",
"ffmpeg_path": "ffmpeg",
"audio_ttl_seconds": 3600,
"audio_clean_interval_seconds": 120
}Notes:
piper_binaryis resolved relative to the installed package (bundled files) first, then the repo root.- Voices are discovered dynamically by scanning
voices/for.onnxfiles (recursively). If an adjacent.onnx.jsonexists, its metadata may be used to improve the display name. - Launch with
--gpu(and installonnxruntime-gpu) to enable Piper--cuda. audio_ttl_seconds: how long (seconds) to keep synthesized MP3s before deleting.audio_clean_interval_seconds: frequency of the cleanup sweep (defaults to a fraction of TTL if omitted).- Set
PSITTSA_AUDIO_DIRenv to override temp storage and persist audio.
-- POST /api/tts
-
Body:
{ "text": "Hello", "engine": "piper", "voice": "<voice-id>" }voiceis optional and only applies topiper. If omitted, the first available voice is used.
-
Response: job record
{ id, engine, voice, voice_name, status, result, error } -
GET
/api/tts→ list jobs -
GET
/api/tts/{id}→ job status -
DELETE
/api/tts/{id}→ cancel job (Piper is terminated; pyttsx3 cancel is best‑effort) -
GET
/audio/{id}.mp3→ MP3 file when completed- Supports
Range: bytes=...for partial content (single range primary; multi-range probes fallback to first). HEAD /audio/{id}.mp3readiness (404 until present). After cleanup, completed jobs becomeexpired(404).
- Supports
-
GET
/api/voices/piper→ list available Piper voices:{ voices: [{ id, name }] }
- Piper not found: make sure the binary exists and is executable; update
config.jsonif you moved it. - No voices found: place Piper voice
.onnxfiles undervoices/(you can organize in subfolders). Optionally include the matching.onnx.jsonnext to each model. ffmpegerror: installffmpegand ensure it's inPATH.pyttsx3no audio: installespeakon Linux.- Cancel: Piper jobs cancel immediately; pyttsx3 cancel is best‑effort.
- Audio disappeared: likely expired due to TTL. Increase
audio_ttl_secondsor set persistentPSITTSA_AUDIO_DIR. - 416 Range errors: malformed or multi-range; server now falls back gracefully when possible.
