Whisper ASR Box is a general-purpose speech recognition toolkit. Whisper Models are trained on a large dataset of diverse audio and is also a multitask model that can perform multilingual speech recognition as well as speech translation and language identification.
Current release (v1.8.2) supports following whisper models:
docker run -d -p 9000:9000 \
-e ASR_MODEL=base \
-e ASR_ENGINE=openai_whisper \
onerahmet/openai-whisper-asr-webservice:latest
docker run -d --gpus all -p 9000:9000 \
-e ASR_MODEL=base \
-e ASR_ENGINE=openai_whisper \
onerahmet/openai-whisper-asr-webservice:latest-gpu
To reduce container startup time by avoiding repeated downloads, you can persist the cache directory:
docker run -d -p 9000:9000 \
-v $PWD/cache:/root/.cache/ \
onerahmet/openai-whisper-asr-webservice:latest
- Multiple ASR engines support (OpenAI Whisper, Faster Whisper, WhisperX)
- Multiple output formats (text, JSON, VTT, SRT, TSV)
- Word-level timestamps support
- Voice activity detection (VAD) filtering
- Speaker diarization (with WhisperX)
- FFmpeg integration for broad audio/video format support
- GPU acceleration support
- Configurable model loading/unloading
- REST API with Swagger documentation
Key configuration options:
ASR_ENGINE
: Engine selection (openai_whisper, faster_whisper, whisperx)ASR_MODEL
: Model selection (tiny, base, small, medium, large-v3, etc.)ASR_MODEL_PATH
: Custom path to store/load modelsASR_DEVICE
: Device selection (cuda, cpu)MODEL_IDLE_TIMEOUT
: Timeout for model unloading
For complete documentation, visit: https://ahmetoner.github.io/whisper-asr-webservice
# Install poetry
pip3 install poetry
# Install dependencies
poetry install
# Run service
poetry run whisper-asr-webservice --host 0.0.0.0 --port 9000
After starting the service, visit http://localhost:9000
or http://0.0.0.0:9000
in your browser to access the Swagger UI documentation and try out the API endpoints.