Susurrus is a flexible audio transcription frontend that leverages various AI models, mostly based on OpenAI Whisper, and backends to convert speech to text. It transcribes audio files, including online content, using a number of optional models and pipelines.
- Support for multiple transcription backends (mlx-whisper, OpenAI Whisper, faster-whisper, transformers, whisper.cpp, ctranslate2, whisper-jax, insanely-fast-whisper)
- Audio file upload and URL input support
- YouTube audio extraction and transcription
- Proxy support for network requests
- Language selection for targeted transcription
- Transcription metrics and progress tracking
- Graphical user interface
- Advanced options including start/end time for transcription, max chunk length, and output format selection for whisper.cpp (enabling subtitle export)
- Audio trimming functionality
- Python 3.8 or higher
- pip (Python package manager)
- Git
- C++ compiler (for whisper.cpp)
- CMake (for whisper.cpp)
- FFmpeg
-
Clone the repository:
git clone https://github.com/CrispStrobe/susurrus.git cd susurrus
-
Create and activate a virtual environment:
- macOS/Linux:
python3 -m venv venv source venv/bin/activate
- Windows:
python -m venv venv venv\Scripts\activate
- macOS/Linux:
-
Install the required packages:
pip install -r requirements.txt
-
Install additional backend-specific packages:
pip install openai-whisper faster-whisper transformers ctranslate2 whisper-jax soundfile insanely-fast-whisper
-
Install whisper.cpp:
git clone https://github.com/ggerganov/whisper.cpp.git cd whisper.cpp mkdir build && cd build cmake .. cmake --build . --config Release cd ../..
or for windows:
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
mkdir build && cd build
# Configure with UTF-8 support
cmake -B . -DCMAKE_CXX_FLAGS="/utf-8" -DCMAKE_BUILD_TYPE=Release ..
# Build
cmake --build . --config Release
cd ../..
- Install FFmpeg:
- macOS:
brew install ffmpeg
- Linux (Ubuntu/Debian):
sudo apt-get update sudo apt-get install ffmpeg
- Windows:
- Download FFmpeg from https://ffmpeg.org/download.html
- Extract the downloaded archive and add the
bin
folder to your system PATH
- macOS:
- Ensure you have a C++ compiler installed. You can use Visual Studio with C++ support or MinGW-w64.
- Install CMake from https://cmake.org/download/ and add it to your system PATH.
-
Activate the virtual environment (if not already activated):
- macOS/Linux:
source venv/bin/activate
- Windows:
venv\Scripts\activate
- macOS/Linux:
-
Run the main application:
python susurrus.py
-
Use the graphical interface to:
- Upload an audio file or provide a URL
- Select the desired transcription backend and model
- Configure advanced options if needed
- Start the transcription process
-
View the transcription results and metrics in the application window
-
Save the transcription to a text file using the "Save" button
The transcription worker script can be run separately for debugging or advanced usage:
python transcribe_worker.py --audio-input <audio_file> --audio-url <url> --model-id <model_id> --word-timestamps --language <lang> --backend <backend> --device <device> --pipeline-type <type> --max-chunk-length <length> --output-format <format> --quantization <quant_type> --batch-size <size> --preprocessor-path <path> --original-model-id <orig_id> --start-time <start> --end-time <end>
Example:
python transcribe_worker.py --audio-input input.wav --model-id mlx-community/whisper-large-v3-mlx --word-timestamps --language en --backend mlx-whisper --device auto --pipeline-type default --start-time 10 --end-time 60
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the Apache 2.0 License. See the LICENSE file for details.