Turn spoken thought into polished prose—entirely offline.
Press a hotkey, speak naturally, and watch your words transform into clean, punctuated text—ready to send. No cloud. No subscription. Just your voice and your machine.
- Hotkey-activated - Press Scroll Lock to start/stop recording
- Voice Activity Detection - Automatically detects when you stop speaking
- GPU-accelerated transcription - Uses faster-whisper via Docker
- LLM text cleanup - Fixes grammar, punctuation, removes filler words ("um", "uh")
- Voice commands - Say "new paragraph", "send", "delete last"
- System tray - Runs quietly in the background
- 100% local - Your audio never leaves your machine
[Hotkey] → Microphone → Voice Detection → Whisper → LLM Cleanup → Paste
↓ ↓
(Silero VAD) (Ollama, local)
All processing happens on your machine. Audio goes to a local Docker container running faster-whisper, then optionally through a local Ollama LLM for text cleanup.
- Windows 10/11
- Python 3.11+
- Docker with:
- faster-whisper container (Wyoming protocol, port 10300)
- Ollama (port 11434) - optional, for LLM text cleanup
- GPU recommended for fast transcription (CPU works but slower)
# Faster-whisper (Wyoming protocol)
docker run -d --name faster-whisper \
--gpus all \
-p 10300:10300 \
rhasspy/wyoming-whisper:latest \
--model large-v3 --language en
# Ollama (optional, for text cleanup)
docker run -d --name ollama \
--gpus all \
-p 11434:11434 \
-v ollama:/root/.ollama \
ollama/ollama
# Pull an LLM model
docker exec ollama ollama pull qwen3:14bgit clone https://github.com/cj-elevate/whisper-llm.git
cd whisper-llm
pip install -r requirements.txtcp config.example.yaml config.yaml
# Edit config.yaml to customize settingspython src/main.pyPress Scroll Lock to start/stop recording. Speak naturally, and text appears in your active window.
Edit config.yaml to customize:
| Setting | Default | Description |
|---|---|---|
hotkey |
scroll lock |
Key to toggle recording |
audio.silence_duration_ms |
1000 |
Pause before transcribing (ms) |
llm.enabled |
true |
Enable LLM text cleanup |
llm.model |
qwen3:14b |
Ollama model for cleanup |
output.method |
auto |
How to insert text (clipboard/sendinput/auto) |
output.clipboard_restore_delay |
0.15 |
Seconds after paste before restoring clipboard |
corrections.enabled |
true |
Enable post-transcription word corrections |
commands.auto_enter_slash |
true |
Auto-press Enter for lone slash commands |
See config.example.yaml for all options with descriptions.
| Command | Effect |
|---|---|
| "send" | Paste text and press Enter |
| "new paragraph" | Insert blank line |
| "new line" | Insert line break |
| "period" / "comma" | Insert punctuation |
| "slash [command]" | Insert slash command (e.g., "slash team" → "/team") |
| "delete last" | Undo last output (Ctrl+Z) |
When you say a slash command by itself (e.g., "slash team"), it's outputted and Enter is pressed automatically. This lets you trigger CLI commands and tool shortcuts hands-free.
Examples:
- "slash team" → Outputs
/team+ presses Enter - "slash end" → Outputs
/end+ presses Enter - "use slash team" → Outputs
use /team(no Enter - not a lone command)
This works with any slash command you've added to corrections.words in config.yaml.
Whisper sometimes misrecognizes domain-specific words. Add corrections in config.yaml:
corrections:
enabled: true
words:
cloud: Claude
cloud code: Claude Code
# Slash commands - add your CLI tools here
slash team: /team
slash start: /startCorrections are case-insensitive, whole-word only ("cloudy" won't be changed), and longer phrases are matched first.
| Mode | Description |
|---|---|
raw |
No processing, direct transcription |
clean |
Fix grammar, punctuation, remove fillers (default) |
Switch modes via the system tray menu.
See TROUBLESHOOTING.md for common issues:
- Wyoming server connection issues
- LLM timeout on cold start
- Audio capture problems
src/
main.py # Entry point
app.py # System tray, hotkey, lifecycle
pipeline.py # Async audio processing
audio.py # Microphone capture, VAD
transcriber.py # Wyoming protocol client
llm.py # Ollama integration
output.py # Text injection (clipboard/SendInput)
config.py # Configuration loading
- No cloud services - All processing is local
- No telemetry - No data collection
- No network calls - Only connects to localhost Docker containers
- Audio stays local - Never transmitted anywhere
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
MIT License - Use it however you like.
- OpenAI Whisper - Speech recognition model
- faster-whisper - Optimized Whisper implementation
- Wyoming Protocol - Audio streaming protocol
- Ollama - Local LLM server
- Silero VAD - Voice activity detection
