Speech2Text

Speech2Text provides a simple and easy-to-use graphical user interface (GUI) for different automatic speech recognition (ASR) systems and services based on OpenAI's Whisper: whisper.cpp, mlx-whisper, faster-whisper, Whisper ASR webservice, and the Whisper API. The application transcribes or translates the speech in audio and video files. The output is a text file or a subtitle file (.vtt or .srt). When you select openai-whisper, mlx-whisper, whisper.cpp, or faster-whisper, the ASR runs locally on your computer.

Please note that mlx-whisper (only available on Macs with an M1, M2, or later) and whisper.cpp are much faster than OpenAI's whisper. Speech2Text can also send the audio/video file to a remote computer running the whisper ASR webservice or use OpenAI's whisper API, which performs ASR on OpenAI's servers.

To achieve the best accuracy, select one of the 'large' models in the Settings (e.g. large-v2 or large-v3-turbo).

Download and install binaries

Binaries for macOS and Windows can be downloaded at https://waltervanheuven.net/s2t/

Run on macOS

Use brew to install latest Python and other apps.

brew install python@3.12
brew install uv
brew install ffmpeg

Clone speech2text.

git clone https://github.com/waltervanheuven/speech2text.git
cd speech2text

Set up venv and install packages using uv.

# venv
uv venv --python 3.12.9
source .venv/bin/activate

# install packages
uv pip install -U pip setuptools wheel
uv pip install -r requirements.txt

Build and install whisper.cpp on macOS

# create folder for whisper.cpp
mkdir bin
mkdir bin/metal

# Further build instructions: https://github.com/ggerganov/whisper.cpp
mkdir tmp
cd tmp
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
cmake -B build
cmake --build build --config Release
cp build/bin/whisper-cli ../../bin/metal/whisper-cli
cd ../..

Run on Windows

Use scoop to install latest Python and other required apps

scoop update
scoop bucket add versions
scoop install python312
scoop install main/uv
scoop install ffmpeg

Clone speech2text.

git clone https://github.com/waltervanheuven/speech2text.git
cd speech2text

Set up venv and install packages using uv.

uv venv --python 3.12.9
source .venv/bin/activate

uv pip install -U pip setuptools wheel
uv pip install -r requirements.txt

Build and install whisper.cpp on Windows

# create folder for whisper.cpp
mkdir bin
mkdir bin/cuda

# build instructions: https://github.com/ggerganov/whisper.cpp
# or download binaries and place `whisper-cli.exe` and `*.dll` in folder `bin`

Start app in venv

python src/Speech2Text.py

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src		src
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
version.txt		version.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Speech2Text

Download and install binaries

Run on macOS

Build and install whisper.cpp on macOS

Run on Windows

Build and install whisper.cpp on Windows

Start app in venv

About

Uh oh!

Releases 3

Uh oh!

Languages

License

waltervanheuven/speech2text

Folders and files

Latest commit

History

Repository files navigation

Speech2Text

Download and install binaries

Run on macOS

Build and install whisper.cpp on macOS

Run on Windows

Build and install whisper.cpp on Windows

Start app in venv

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Uh oh!

Languages