Skip to content

magasb1/whisper-asr-webservice

 
 

Repository files navigation

Release Docker Pulls Build Licence

Whisper ASR Box

Whisper ASR Box is a general-purpose speech recognition toolkit. Whisper Models are trained on a large dataset of diverse audio and is also a multitask model that can perform multilingual speech recognition as well as speech translation and language identification.

Features

Current release (v1.8.2) supports following whisper models:

Quick Usage

CPU

docker run -d -p 9000:9000 \
  -e ASR_MODEL=base \
  -e ASR_ENGINE=openai_whisper \
  onerahmet/openai-whisper-asr-webservice:latest

GPU

docker run -d --gpus all -p 9000:9000 \
  -e ASR_MODEL=base \
  -e ASR_ENGINE=openai_whisper \
  onerahmet/openai-whisper-asr-webservice:latest-gpu

Cache

To reduce container startup time by avoiding repeated downloads, you can persist the cache directory:

docker run -d -p 9000:9000 \
  -v $PWD/cache:/root/.cache/ \
  onerahmet/openai-whisper-asr-webservice:latest

Key Features

  • Multiple ASR engines support (OpenAI Whisper, Faster Whisper, WhisperX)
  • Multiple output formats (text, JSON, VTT, SRT, TSV)
  • Word-level timestamps support
  • Voice activity detection (VAD) filtering
  • Speaker diarization (with WhisperX)
  • FFmpeg integration for broad audio/video format support
  • GPU acceleration support
  • Configurable model loading/unloading
  • REST API with Swagger documentation

Environment Variables

Key configuration options:

  • ASR_ENGINE: Engine selection (openai_whisper, faster_whisper, whisperx)
  • ASR_MODEL: Model selection (tiny, base, small, medium, large-v3, etc.)
  • ASR_MODEL_PATH: Custom path to store/load models
  • ASR_DEVICE: Device selection (cuda, cpu)
  • MODEL_IDLE_TIMEOUT: Timeout for model unloading

Documentation

For complete documentation, visit: https://ahmetoner.github.io/whisper-asr-webservice

Development

# Install poetry
pip3 install poetry

# Install dependencies
poetry install

# Run service
poetry run whisper-asr-webservice --host 0.0.0.0 --port 9000

After starting the service, visit http://localhost:9000 or http://0.0.0.0:9000 in your browser to access the Swagger UI documentation and try out the API endpoints.

Credits

  • This software uses libraries from the FFmpeg project under the LGPLv2.1

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.2%
  • Dockerfile 2.8%