The Speech-to-Command script converts spoken words into keyboard commands using speech recognition. It captures audio from your microphone, detects when you start speaking, processes it through a speech recognition model, matches recognized text to predefined command phrases, and executes corresponding actions via simulated keyboard inputs.
- Voice Command Recognition: Converts spoken phrases to predefined commands.
- Simulated Keyboard Actions: Executes actions such as key presses or hotkey combinations in response to voice commands.
- Continuous Listening: Monitors audio input until silence detected.
- Customizable Commands: Easily modify command mappings and associated actions.
This project relies on several Python libraries. Install them using the following command:
On Macos
python3.11 -m pip install sounddevice numpy whisper scipy pyautogui pyobjcOn Fedora
sudo dnf install python3.11-devel portaudio-devel python3.11-tkinter
python3.11 -m ensurepip
python3.11 -m pip install sounddevice numpy openai-whisper scipy pyautogui pyaudioNote #1: This script has been developed and tested for macOS systems, especially with regard to its UI element suppression feature. Note #2: If you have a GPU; Whisper gonna lock a certain amount of VRAM. (Take a look at line #2 to force CPU)
Commands are configured in a dictionary named COMMAND_MAP. Each command can have multiple trigger phrases associated with it.
- Multiple Phrases: Each tuple contains several phrases that trigger the same action.
- Action Definition: Actions are defined as lambda functions or regular functions, which perform key presses or combinations using
pyautogui.
Run the script with:
python main.pyPress Ctrl+C to stop the script. The system listens continuously for voice commands and performs actions when recognized phrases match entries in the command map.
- Ensure your microphone is properly configured before starting the script.
- Customize
COMMAND_MAPas needed to suit your specific requirements. - Commands are executed in a background thread to maintain responsive listening.
- You need a GUI; otherwise the
pyautoguiwon't start
- Tiny Model on CPU takes 1 core at 100% and answer fast enough for my use case.
- Tiny Model on GPU uses 466MB and is faster than CPU.
- Small Model on GPU uses ~1700MB and is better than the tiny and fast enough, but was causing a lot of issue with Star Citizen (Making it unplayeable)
This project is licensed under the MIT License. See the LICENSE file for details.