Speech2Text provides a simple and easy-to-use graphical user interface (GUI) for different automatic speech recognition (ASR) systems and services based on OpenAI's Whisper: whisper.cpp, mlx-whisper, faster-whisper, Whisper ASR webservice, and the Whisper API. The application transcribes or translates the speech in audio and video files. The output is a text file or a subtitle file (.vtt or .srt). When you select openai-whisper, mlx-whisper, whisper.cpp, or faster-whisper, the ASR runs locally on your computer.
Please note that mlx-whisper (only available on Macs with an M1, M2, or later) and whisper.cpp are much faster than OpenAI's whisper. Speech2Text can also send the audio/video file to a remote computer running the whisper ASR webservice or use OpenAI's whisper API, which performs ASR on OpenAI's servers.
To achieve the best accuracy, select one of the 'large' models in the Settings (e.g. large-v2
or large-v3-turbo
).
Binaries for macOS and Windows can be downloaded at https://waltervanheuven.net/s2t/
Use brew to install latest Python and other apps.
brew install python@3.12
brew install uv
brew install ffmpeg
Clone speech2text.
git clone https://github.com/waltervanheuven/speech2text.git
cd speech2text
Set up venv and install packages using uv.
# venv
uv venv --python 3.12.9
source .venv/bin/activate
# install packages
uv pip install -U pip setuptools wheel
uv pip install -r requirements.txt
# create folder for whisper.cpp
mkdir bin
mkdir bin/metal
# Further build instructions: https://github.com/ggerganov/whisper.cpp
mkdir tmp
cd tmp
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
cmake -B build
cmake --build build --config Release
cp build/bin/whisper-cli ../../bin/metal/whisper-cli
cd ../..
Use scoop to install latest Python and other required apps
scoop update
scoop bucket add versions
scoop install python312
scoop install main/uv
scoop install ffmpeg
Clone speech2text.
git clone https://github.com/waltervanheuven/speech2text.git
cd speech2text
Set up venv and install packages using uv.
uv venv --python 3.12.9
source .venv/bin/activate
uv pip install -U pip setuptools wheel
uv pip install -r requirements.txt
# create folder for whisper.cpp
mkdir bin
mkdir bin/cuda
# build instructions: https://github.com/ggerganov/whisper.cpp
# or download binaries and place `whisper-cli.exe` and `*.dll` in folder `bin`
python src/Speech2Text.py