Susurrus: Whisper Audio Transcription GUI

Susurrus is a flexible audio transcription frontend that leverages various AI models, mostly based on OpenAI Whisper, and backends to convert speech to text. It transcribes audio files, including online content, using a number of optional models and pipelines.

Features

Support for multiple transcription backends (mlx-whisper, OpenAI Whisper, faster-whisper, transformers, whisper.cpp, ctranslate2, whisper-jax, insanely-fast-whisper)
Audio file upload and URL input support
YouTube audio extraction and transcription
Proxy support for network requests
Language selection for targeted transcription
Transcription metrics and progress tracking
Graphical user interface
Advanced options including start/end time for transcription, max chunk length, and output format selection for whisper.cpp (enabling subtitle export)
Audio trimming functionality

Screenshot

Installation

Prerequisites

Python 3.8 or higher
pip (Python package manager)
Git
C++ compiler (for whisper.cpp)
CMake (for whisper.cpp)
FFmpeg

Common Steps (macOS, Linux, and Windows)

Clone the repository:

git clone https://github.com/CrispStrobe/susurrus.git
cd susurrus

Create and activate a virtual environment:

macOS/Linux:

python3 -m venv venv
source venv/bin/activate

Windows:

python -m venv venv
venv\Scripts\activate

Install the required packages:
```
pip install -r requirements.txt
```

Install additional backend-specific packages:

pip install openai-whisper faster-whisper transformers ctranslate2 whisper-jax soundfile insanely-fast-whisper

Install whisper.cpp:

git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
mkdir build && cd build
cmake ..
cmake --build . --config Release
cd ../..

or for windows:

git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
mkdir build && cd build

# Configure with UTF-8 support
cmake -B . -DCMAKE_CXX_FLAGS="/utf-8" -DCMAKE_BUILD_TYPE=Release ..

# Build
cmake --build . --config Release
cd ../..

Install FFmpeg:
- macOS:
```
brew install ffmpeg
```
- Linux (Ubuntu/Debian):
```
sudo apt-get update
sudo apt-get install ffmpeg
```
- Windows:
  - Download FFmpeg from https://ffmpeg.org/download.html
  - Extract the downloaded archive and add the bin folder to your system PATH

Additional Steps for Windows

Ensure you have a C++ compiler installed. You can use Visual Studio with C++ support or MinGW-w64.
Install CMake from https://cmake.org/download/ and add it to your system PATH.

Usage

Activate the virtual environment (if not already activated):
- macOS/Linux: source venv/bin/activate
- Windows: venv\Scripts\activate
Run the main application:
```
python susurrus.py
```
Use the graphical interface to:
- Upload an audio file or provide a URL
- Select the desired transcription backend and model
- Configure advanced options if needed
- Start the transcription process
View the transcription results and metrics in the application window
Save the transcription to a text file using the "Save" button

Running the Transcription Worker Script

The transcription worker script can be run separately for debugging or advanced usage:

python transcribe_worker.py --audio-input <audio_file> --audio-url <url> --model-id <model_id> --word-timestamps --language <lang> --backend <backend> --device <device> --pipeline-type <type> --max-chunk-length <length> --output-format <format> --quantization <quant_type> --batch-size <size> --preprocessor-path <path> --original-model-id <orig_id> --start-time <start> --end-time <end>

Example:

python transcribe_worker.py --audio-input input.wav --model-id mlx-community/whisper-large-v3-mlx --word-timestamps --language en --backend mlx-whisper --device auto --pipeline-type default --start-time 10 --end-time 60

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the Apache 2.0 License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
susurrus.png		susurrus.png
susurrus.py		susurrus.py
transcribe_worker.py		transcribe_worker.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Susurrus: Whisper Audio Transcription GUI

Features

Screenshot

Installation

Prerequisites

Common Steps (macOS, Linux, and Windows)

Additional Steps for Windows

Usage

Running the Transcription Worker Script

Contributing

License

Acknowledgements

About

Releases

Packages

Languages

License

CrispStrobe/Susurrus

Folders and files

Latest commit

History

Repository files navigation

Susurrus: Whisper Audio Transcription GUI

Features

Screenshot

Installation

Prerequisites

Common Steps (macOS, Linux, and Windows)

Additional Steps for Windows

Usage

Running the Transcription Worker Script

Contributing

License

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages