Skip to content

zweck/whisper-toggle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

whisper-toggle

Press once to record. Press again to transcribe and paste.

License: MIT AUR whisper.cpp

Single-keybinding speech-to-text for Linux desktops.
Pure bash. Local inference. No cloud. No latency.


Super+`  -->  🎙️ Recording...  -->  Super+`  -->  📋 Pasted!

How It Works

┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  1st press   │────>│  Recording   │────>│ Transcribing │────>│   Pasted    │
│  Super + `   │     │  (sox/rec)   │     │ (whisper.cpp)│     │ (clipboard) │
└─────────────┘     └──────┬──────┘     └─────────────┘     └─────────────┘
                           │
                    ┌──────┴──────┐
                    │  2nd press   │  (or silence auto-stops)
                    │  Super + `   │
                    └─────────────┘

The server backend loads the model while you speak — by the time you stop talking, inference is nearly instant.

Quick Start

# Install from AUR
yay -S whisper-toggle

# Interactive setup: GPU, model, backend, keybinding
whisper-toggle-setup

whisper.cpp required — install via AUR (yay -S whisper.cpp-cuda) or build from source. The setup wizard will guide you.

Features

  • Single keybinding — toggle recording on/off, transcription auto-pastes
  • Dual backend — on-demand whisper-server (recommended) or direct whisper-cli
  • GPU accelerated — CUDA, ROCm, Vulkan, or CPU fallback
  • X11 + Wayland — auto-detects session, uses the right clipboard/paste tools
  • Interactive setup — detects GPU, downloads models, configures your WM
  • XDG compliant — config in ~/.config/, models in ~/.local/share/, temp in /dev/shm/
  • No daemon — server starts and stops per-transcription, zero background footprint
  • Smart silence detection — recording stops automatically when you stop speaking
  • Post-processing — strips non-speech markers, trims whitespace, capitalizes

Dependencies

Required
bashsoxcurl
jqlibnotifylibpulse
X11
xselxdotool
Wayland
wl-clipboardydotool
Optional
pciutils (GPU detection in setup wizard)

Keybindings

The setup wizard auto-detects your WM and offers to configure this for you.

i3
bindsym $mod+grave exec --no-startup-id whisper-toggle
sway
bindsym $mod+grave exec whisper-toggle
Hyprland
bind = $mainMod, grave, exec, whisper-toggle
GNOME

Configured via whisper-toggle-setup using gsettings, or manually:
Settings > Keyboard > Custom Shortcuts

KDE

System Settings > Shortcuts > Custom Shortcuts > Add > Command/URL

Configuration

~/.config/whisper-toggle/whisper-toggle.conf

BACKEND="server"              # "server" or "cli"
WHISPER_SERVER=""              # Path to whisper-server (auto-detected)
WHISPER_CLI=""                 # Path to whisper-cli (auto-detected)
WHISPER_MODEL="~/.local/share/whisper-toggle/models/ggml-small.en.bin"
WHISPER_PORT=58080             # Server backend port
WHISPER_DEVICE=0               # GPU index (-1 for CPU-only)
WHISPER_THREADS=4              # CPU threads for inference
WHISPER_LANGUAGE="en"          # Language code or "auto"
AUTOPASTE=1                    # Auto-paste after transcription
SILENCE_DURATION=3.0           # Seconds of silence before auto-stop
SILENCE_THRESHOLD=3            # Silence sensitivity (%)

Models

Model Size Speed Quality Best For
tiny.en 75 MB ⚡⚡⚡⚡ ★★ Quick notes, low-end hardware
base.en 142 MB ⚡⚡⚡ ★★★ Everyday use
small.en 466 MB ⚡⚡ ★★★★ Recommended
medium.en 1.5 GB ★★★★★ High accuracy needs
large-v3-turbo 1.6 GB ⚡⚡ ★★★★★ Best speed/accuracy ratio
large-v3 3.1 GB ★★★★★ Maximum accuracy

Models are downloaded by the setup wizard to ~/.local/share/whisper-toggle/models/.

Backends

server (recommended)

[key press] ──> whisper-server starts ──> model loads ──> ┐
                recording starts ──────> audio captured ──> inference ──> paste
                                                           server killed

The model loads in parallel with your speech. No persistent daemon — the server is started and killed per use.

cli

[key press] ──> recording starts ──> audio captured ──> whisper-cli runs ──> paste

Simpler, but slower — the model loads after you stop speaking.

GPU Setup

The setup wizard detects your GPU(s) via lspci and recommends the right device.

GPU whisper.cpp Package Build Flag
NVIDIA whisper.cpp-cuda -DGGML_CUDA=ON
AMD whisper.cpp-hip -DGGML_HIP=ON
Any whisper.cpp-vulkan -DGGML_VULKAN=ON
None whisper.cpp (default)

Troubleshooting

No sound recorded

Check that PulseAudio/PipeWire is running and rec can access your mic:

rec -q -t wav /tmp/test.wav rate 16k
Server failed to start

Check the log for whisper-server errors:

cat /tmp/whisper-toggle.log

Common causes: wrong GPU device index, missing CUDA/Vulkan drivers, model file not found.

Nothing pastes

X11: Install xsel and xdotool
Wayland: Install wl-clipboard and ydotool, ensure ydotoold is running

Double triggers

The 500ms debounce should prevent this. If your WM sends multiple key events, use exec --no-startup-id (i3) or increase the debounce in the script.

whisper-server / whisper-cli not found

The script searches ~/whisper.cpp/build/bin/, ~/.local/bin/, /usr/local/bin/, and /usr/bin/. You can also set WHISPER_SERVER / WHISPER_CLI explicitly in the config.


MIT License — built with whisper.cpp

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages