Skip to content

Faster-Whisper Transcription Server & API is a production-ready speech-to-text micro-service stack that wraps faster-whisper with a streaming FastAPI server, a Celery/Redis background queue, and optional Docker deployment—delivering real-time or batch audio transcription with minimal latency and simple web-hook integration.

Notifications You must be signed in to change notification settings

nirnaim/faster-whisper-server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Faster‑Whisper Transcription Server & API

A production‑ready speech‑to‑text micro‑service built around faster‑whisper. It exposes a low‑latency streaming endpoint (Server‑Sent Events) as well as a Celery‑powered background queue, making it easy to transcribe audio files online or offline at scale.

Features

  • Real‑time streaming — partial & final segments are pushed via SSE while inference is running.
  • 🏎 faster‑whisper backend — leverages quantised models and batched inference for GPU/CPU.
  • ♻️ Model pool — spin up several model replicas (NUM_OF_MODELS) and round‑robin between them.
  • 🎧 Stereo splitting — optional dual‑channel diarisation by splitting stereo files.
  • 🔌 Webhook notifications — receive a JSON callback when a background job completes.
  • 🐇 Celery + Redis queue — run heavy jobs asynchronously.
  • 🔋 Stateless REST layer — thin FastAPI wrapper that can scale horizontally.
  • 📦 Docker‑first — sample Dockerfile and docker‑compose.yml for one‑command deployment.
  • 🔐 .env configuration — zero‑code configuration of model size, batch size, broker, etc.

Architecture

                +-----------+              +------------------+
    client ---->|  API 8000 |-----Redis----| Celery Workers   |
    (browser /  | (FastAPI) |              +------------------+
    backend)    |           |--HTTP-> +------------------+
                +-----------+         | Whisper Server   |
                                       | 4213 (FastAPI)  |
                                       +------------------+
  • faster_whisper_server/ runs the actual Whisper model and streams segments back.
  • api/ is an edge service that accepts uploads, triggers synchronous or queued jobs, and renders a minimal HTML demo.

Quick start

1. Clone & install

git clone https://github.com/your‑org/faster-whisper-server.git
cd faster-whisper-server
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt           # see below

2. Create .env

# common
UPLOAD_PATH=./files
# model server
NUM_OF_MODELS=2
MODEL_SIZE=large-v3
COMPUTE_TYPE=float16
BATCH_SIZE=8
# api service
TRANSCRIBE_SERVER=http://localhost:4213/transcribe
DEFAULT_WEBHOOK_URL=https://example.com/webhook
# celery / redis
CELERY_BROKER=redis://localhost:6379/0
CELERY_BACKEND=redis://localhost:6379/0

3. Run services (dev)

# Start the Whisper model server
uvicorn faster_whisper_server.app:app --host 0.0.0.0 --port 4213 --reload

# In another shell, start API + front‑end demo
uvicorn api.app:app --host 0.0.0.0 --port 8000 --reload

# Optional – background queue
celery -A api.tasks worker --loglevel=info

4. Usage

Online (streaming)

curl -N -F file=@sample.wav http://localhost:8000/transcribe

Offline (queued)

curl -F file=@sample.wav -F online=false http://localhost:8000/transcribe
# → {"task_id":"d0bf...","status":"submitted"}

5. Docker

docker compose up --build -d

The compose file starts Redis, the model server, the API gateway and a Celery worker.

API reference

Service Method Path Params Description
Whisper server POST /transcribe file (audio) Streams JSON events ({"text": ...})
API POST /transcribe file, online (bool), stereo_split (bool), webhook_url Streams or enqueues job
API GET / Minimal HTML test page

Project structure

.
├── api/                       # Edge service + celery
│   ├── services/              # Wrapper around model server
│   ├── tasks.py               # Celery jobs
│   └── templates/             # Demo UI
├── faster_whisper_server/     # High‑performance inference service
│   ├── models.py              # Model pool
│   └── app.py                 # FastAPI entrypoint
└── README.md

Requirements

fastapi
uvicorn[standard]
faster-whisper
torch>=2.0
celery
redis
requests
python-dotenv
pydub
clickhouse-driver   # optional, metrics sink

Tip: GPU with at least 8 GB VRAM recommended for large‑v3.

Contributing

  1. Fork ‑> create branch ‑> PR.
  2. Please run ruff and black.

License

MIT © 2025 Your Name

About

Faster-Whisper Transcription Server & API is a production-ready speech-to-text micro-service stack that wraps faster-whisper with a streaming FastAPI server, a Celery/Redis background queue, and optional Docker deployment—delivering real-time or batch audio transcription with minimal latency and simple web-hook integration.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published