🎙️ A simple voice-to-text tool for Linux using Whisper, ffmpeg, and X11/Wayland tools.
- Record voice with a global shortcut
- Automatically transcribe audio using OpenAI Whisper
- Type transcription directly into the focused text field
- Copy transcription to clipboard
- Supports X11 (xdotool + xclip) and Wayland (wtype + wl-copy)
- Linux distribution with
bash - Python 3.8+
- ffmpeg
- xdotool (X11) or wtype (Wayland)
- xclip (X11) or wl-clipboard (Wayland)
- libnotify (
notify-send)
- Clone this repository:
git clone https://github.com/StefTzor/voice-type-linux.git
cd voice-type-linux- Run the installer script to install dependencies and set up Python environment:
./install.sh- Activate the Python virtual environment:
source ~/voice-type-env/bin/activate- Make the scripts executable (if needed):
chmod +x bin/start_voice_type.sh bin/stop_voice_type.sh- Start recording:
./bin/start_voice_type.sh- Stop recording and transcribe:
./bin/stop_voice_type.sh- Open System Settings > Shortcuts > Custom Shortcuts.
- Create a new global shortcut:
- Trigger: your preferred key combo (e.g., Ctrl+Alt+R) to start recording
- Action: run command
/path/to/voice-type-linux/bin/start_voice_type.sh
- Create another global shortcut:
- Trigger: another key combo (e.g., Ctrl+Alt+S) to stop recording & transcribe
- Action: run command
/path/to/voice-type-linux/bin/stop_voice_type.sh
- On Wayland, you need
wtypeandwl-clipboardinstalled. On X11,xdotoolandxclipare required. - If you use another desktop environment, adjust dependencies accordingly.
- The scripts detect if running under Wayland or X11 automatically and switch tools.
- The recorded audio is saved temporarily in
/tmp/voice_input.wavand transcribed text in/tmp/voice_input.txt. - You can customize recording device in
start_voice_type.shif needed.
- If transcription doesn’t type correctly, check that
xdotoolorwtypeis installed and working. - If clipboard copy fails, verify
xcliporwl-copyavailability. - Notifications require
libnotify(notify-send). - Make sure your keyboard shortcuts are properly set in your DE.
This project is licensed under the MIT License - see the LICENSE file for details.
Project Link: https://github.com/StefTzor/voice-type-linux