A production‑ready speech‑to‑text micro‑service built around faster‑whisper. It exposes a low‑latency streaming endpoint (Server‑Sent Events) as well as a Celery‑powered background queue, making it easy to transcribe audio files online or offline at scale.
- ⚡ Real‑time streaming — partial & final segments are pushed via SSE while inference is running.
- 🏎 faster‑whisper backend — leverages quantised models and batched inference for GPU/CPU.
- ♻️ Model pool — spin up several model replicas (
NUM_OF_MODELS
) and round‑robin between them. - 🎧 Stereo splitting — optional dual‑channel diarisation by splitting stereo files.
- 🔌 Webhook notifications — receive a JSON callback when a background job completes.
- 🐇 Celery + Redis queue — run heavy jobs asynchronously.
- 🔋 Stateless REST layer — thin FastAPI wrapper that can scale horizontally.
- 📦 Docker‑first — sample
Dockerfile
anddocker‑compose.yml
for one‑command deployment. - 🔐 .env configuration — zero‑code configuration of model size, batch size, broker, etc.
+-----------+ +------------------+
client ---->| API 8000 |-----Redis----| Celery Workers |
(browser / | (FastAPI) | +------------------+
backend) | |--HTTP-> +------------------+
+-----------+ | Whisper Server |
| 4213 (FastAPI) |
+------------------+
faster_whisper_server/
runs the actual Whisper model and streams segments back.api/
is an edge service that accepts uploads, triggers synchronous or queued jobs, and renders a minimal HTML demo.
git clone https://github.com/your‑org/faster-whisper-server.git
cd faster-whisper-server
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt # see below
# common
UPLOAD_PATH=./files
# model server
NUM_OF_MODELS=2
MODEL_SIZE=large-v3
COMPUTE_TYPE=float16
BATCH_SIZE=8
# api service
TRANSCRIBE_SERVER=http://localhost:4213/transcribe
DEFAULT_WEBHOOK_URL=https://example.com/webhook
# celery / redis
CELERY_BROKER=redis://localhost:6379/0
CELERY_BACKEND=redis://localhost:6379/0
# Start the Whisper model server
uvicorn faster_whisper_server.app:app --host 0.0.0.0 --port 4213 --reload
# In another shell, start API + front‑end demo
uvicorn api.app:app --host 0.0.0.0 --port 8000 --reload
# Optional – background queue
celery -A api.tasks worker --loglevel=info
Online (streaming)
curl -N -F file=@sample.wav http://localhost:8000/transcribe
Offline (queued)
curl -F file=@sample.wav -F online=false http://localhost:8000/transcribe
# → {"task_id":"d0bf...","status":"submitted"}
docker compose up --build -d
The compose file starts Redis, the model server, the API gateway and a Celery worker.
Service | Method | Path | Params | Description |
---|---|---|---|---|
Whisper server | POST |
/transcribe |
file (audio) |
Streams JSON events ({"text": ...} ) |
API | POST |
/transcribe |
file , online (bool), stereo_split (bool), webhook_url |
Streams or enqueues job |
API | GET |
/ |
– | Minimal HTML test page |
.
├── api/ # Edge service + celery
│ ├── services/ # Wrapper around model server
│ ├── tasks.py # Celery jobs
│ └── templates/ # Demo UI
├── faster_whisper_server/ # High‑performance inference service
│ ├── models.py # Model pool
│ └── app.py # FastAPI entrypoint
└── README.md
fastapi
uvicorn[standard]
faster-whisper
torch>=2.0
celery
redis
requests
python-dotenv
pydub
clickhouse-driver # optional, metrics sink
Tip: GPU with at least 8 GB VRAM recommended for
large‑v3
.
- Fork ‑> create branch ‑> PR.
- Please run
ruff
andblack
.
MIT © 2025 Your Name