Press once to record. Press again to transcribe and paste.
Single-keybinding speech-to-text for Linux desktops.
Pure bash. Local inference. No cloud. No latency.
Super+` --> 🎙️ Recording... --> Super+` --> 📋 Pasted!
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ 1st press │────>│ Recording │────>│ Transcribing │────>│ Pasted │
│ Super + ` │ │ (sox/rec) │ │ (whisper.cpp)│ │ (clipboard) │
└─────────────┘ └──────┬──────┘ └─────────────┘ └─────────────┘
│
┌──────┴──────┐
│ 2nd press │ (or silence auto-stops)
│ Super + ` │
└─────────────┘
The server backend loads the model while you speak — by the time you stop talking, inference is nearly instant.
# Install from AUR
yay -S whisper-toggle
# Interactive setup: GPU, model, backend, keybinding
whisper-toggle-setupwhisper.cpp required — install via AUR (
yay -S whisper.cpp-cuda) or build from source. The setup wizard will guide you.
- Single keybinding — toggle recording on/off, transcription auto-pastes
- Dual backend — on-demand
whisper-server(recommended) or directwhisper-cli - GPU accelerated — CUDA, ROCm, Vulkan, or CPU fallback
- X11 + Wayland — auto-detects session, uses the right clipboard/paste tools
- Interactive setup — detects GPU, downloads models, configures your WM
- XDG compliant — config in
~/.config/, models in~/.local/share/, temp in/dev/shm/ - No daemon — server starts and stops per-transcription, zero background footprint
- Smart silence detection — recording stops automatically when you stop speaking
- Post-processing — strips non-speech markers, trims whitespace, capitalizes
| Required | ||
|---|---|---|
bash | sox | curl |
jq | libnotify | libpulse |
| X11 | ||
xsel | xdotool | |
| Wayland | ||
wl-clipboard | ydotool | |
| Optional | ||
pciutils (GPU detection in setup wizard) | ||
The setup wizard auto-detects your WM and offers to configure this for you.
i3
bindsym $mod+grave exec --no-startup-id whisper-toggle
sway
bindsym $mod+grave exec whisper-toggle
Hyprland
bind = $mainMod, grave, exec, whisper-toggle
GNOME
Configured via whisper-toggle-setup using gsettings, or manually:
Settings > Keyboard > Custom Shortcuts
KDE
System Settings > Shortcuts > Custom Shortcuts > Add > Command/URL
~/.config/whisper-toggle/whisper-toggle.conf
BACKEND="server" # "server" or "cli"
WHISPER_SERVER="" # Path to whisper-server (auto-detected)
WHISPER_CLI="" # Path to whisper-cli (auto-detected)
WHISPER_MODEL="~/.local/share/whisper-toggle/models/ggml-small.en.bin"
WHISPER_PORT=58080 # Server backend port
WHISPER_DEVICE=0 # GPU index (-1 for CPU-only)
WHISPER_THREADS=4 # CPU threads for inference
WHISPER_LANGUAGE="en" # Language code or "auto"
AUTOPASTE=1 # Auto-paste after transcription
SILENCE_DURATION=3.0 # Seconds of silence before auto-stop
SILENCE_THRESHOLD=3 # Silence sensitivity (%)| Model | Size | Speed | Quality | Best For |
|---|---|---|---|---|
tiny.en |
75 MB | ⚡⚡⚡⚡ | ★★ | Quick notes, low-end hardware |
base.en |
142 MB | ⚡⚡⚡ | ★★★ | Everyday use |
small.en |
466 MB | ⚡⚡ | ★★★★ | Recommended |
medium.en |
1.5 GB | ⚡ | ★★★★★ | High accuracy needs |
large-v3-turbo |
1.6 GB | ⚡⚡ | ★★★★★ | Best speed/accuracy ratio |
large-v3 |
3.1 GB | ⚡ | ★★★★★ | Maximum accuracy |
Models are downloaded by the setup wizard to ~/.local/share/whisper-toggle/models/.
[key press] ──> whisper-server starts ──> model loads ──> ┐
recording starts ──────> audio captured ──> inference ──> paste
server killed
The model loads in parallel with your speech. No persistent daemon — the server is started and killed per use.
[key press] ──> recording starts ──> audio captured ──> whisper-cli runs ──> paste
Simpler, but slower — the model loads after you stop speaking.
The setup wizard detects your GPU(s) via lspci and recommends the right device.
| GPU | whisper.cpp Package | Build Flag |
|---|---|---|
| NVIDIA | whisper.cpp-cuda |
-DGGML_CUDA=ON |
| AMD | whisper.cpp-hip |
-DGGML_HIP=ON |
| Any | whisper.cpp-vulkan |
-DGGML_VULKAN=ON |
| None | whisper.cpp |
(default) |
No sound recorded
Check that PulseAudio/PipeWire is running and rec can access your mic:
rec -q -t wav /tmp/test.wav rate 16kServer failed to start
Check the log for whisper-server errors:
cat /tmp/whisper-toggle.logCommon causes: wrong GPU device index, missing CUDA/Vulkan drivers, model file not found.
Nothing pastes
X11: Install xsel and xdotool
Wayland: Install wl-clipboard and ydotool, ensure ydotoold is running
Double triggers
The 500ms debounce should prevent this. If your WM sends multiple key events, use exec --no-startup-id (i3) or increase the debounce in the script.
whisper-server / whisper-cli not found
The script searches ~/whisper.cpp/build/bin/, ~/.local/bin/, /usr/local/bin/, and /usr/bin/. You can also set WHISPER_SERVER / WHISPER_CLI explicitly in the config.