Skip to content

cj-elevate/whisper-llm

Repository files navigation

whisper-llm banner

whisper-llm

Turn spoken thought into polished prose—entirely offline.

Press a hotkey, speak naturally, and watch your words transform into clean, punctuated text—ready to send. No cloud. No subscription. Just your voice and your machine.

Features

  • Hotkey-activated - Press Scroll Lock to start/stop recording
  • Voice Activity Detection - Automatically detects when you stop speaking
  • GPU-accelerated transcription - Uses faster-whisper via Docker
  • LLM text cleanup - Fixes grammar, punctuation, removes filler words ("um", "uh")
  • Voice commands - Say "new paragraph", "send", "delete last"
  • System tray - Runs quietly in the background
  • 100% local - Your audio never leaves your machine

How It Works

[Hotkey] → Microphone → Voice Detection → Whisper → LLM Cleanup → Paste
                              ↓                         ↓
                        (Silero VAD)              (Ollama, local)

All processing happens on your machine. Audio goes to a local Docker container running faster-whisper, then optionally through a local Ollama LLM for text cleanup.

Requirements

  • Windows 10/11
  • Python 3.11+
  • Docker with:
    • faster-whisper container (Wyoming protocol, port 10300)
    • Ollama (port 11434) - optional, for LLM text cleanup
  • GPU recommended for fast transcription (CPU works but slower)

Quick Start

1. Start Docker containers

# Faster-whisper (Wyoming protocol)
docker run -d --name faster-whisper \
  --gpus all \
  -p 10300:10300 \
  rhasspy/wyoming-whisper:latest \
  --model large-v3 --language en

# Ollama (optional, for text cleanup)
docker run -d --name ollama \
  --gpus all \
  -p 11434:11434 \
  -v ollama:/root/.ollama \
  ollama/ollama

# Pull an LLM model
docker exec ollama ollama pull qwen3:14b

2. Install whisper-llm

git clone https://github.com/cj-elevate/whisper-llm.git
cd whisper-llm
pip install -r requirements.txt

3. Configure

cp config.example.yaml config.yaml
# Edit config.yaml to customize settings

4. Run

python src/main.py

Press Scroll Lock to start/stop recording. Speak naturally, and text appears in your active window.

Configuration

Edit config.yaml to customize:

Setting Default Description
hotkey scroll lock Key to toggle recording
audio.silence_duration_ms 1000 Pause before transcribing (ms)
llm.enabled true Enable LLM text cleanup
llm.model qwen3:14b Ollama model for cleanup
output.method auto How to insert text (clipboard/sendinput/auto)
output.clipboard_restore_delay 0.15 Seconds after paste before restoring clipboard
corrections.enabled true Enable post-transcription word corrections
commands.auto_enter_slash true Auto-press Enter for lone slash commands

See config.example.yaml for all options with descriptions.

Voice Commands

Command Effect
"send" Paste text and press Enter
"new paragraph" Insert blank line
"new line" Insert line break
"period" / "comma" Insert punctuation
"slash [command]" Insert slash command (e.g., "slash team" → "/team")
"delete last" Undo last output (Ctrl+Z)

Auto-Enter for Slash Commands

When you say a slash command by itself (e.g., "slash team"), it's outputted and Enter is pressed automatically. This lets you trigger CLI commands and tool shortcuts hands-free.

Examples:

  • "slash team" → Outputs /team + presses Enter
  • "slash end" → Outputs /end + presses Enter
  • "use slash team" → Outputs use /team (no Enter - not a lone command)

This works with any slash command you've added to corrections.words in config.yaml.

Word Corrections

Whisper sometimes misrecognizes domain-specific words. Add corrections in config.yaml:

corrections:
  enabled: true
  words:
    cloud: Claude
    cloud code: Claude Code
    # Slash commands - add your CLI tools here
    slash team: /team
    slash start: /start

Corrections are case-insensitive, whole-word only ("cloudy" won't be changed), and longer phrases are matched first.

LLM Modes

Mode Description
raw No processing, direct transcription
clean Fix grammar, punctuation, remove fillers (default)

Switch modes via the system tray menu.

Troubleshooting

See TROUBLESHOOTING.md for common issues:

  • Wyoming server connection issues
  • LLM timeout on cold start
  • Audio capture problems

Project Structure

src/
  main.py        # Entry point
  app.py         # System tray, hotkey, lifecycle
  pipeline.py    # Async audio processing
  audio.py       # Microphone capture, VAD
  transcriber.py # Wyoming protocol client
  llm.py         # Ollama integration
  output.py      # Text injection (clipboard/SendInput)
  config.py      # Configuration loading

Privacy

  • No cloud services - All processing is local
  • No telemetry - No data collection
  • No network calls - Only connects to localhost Docker containers
  • Audio stays local - Never transmitted anywhere

Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Submit a pull request

License

MIT License - Use it however you like.

Acknowledgments

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages