Speech to Command

Overview

The Speech-to-Command script converts spoken words into keyboard commands using speech recognition. It captures audio from your microphone, detects when you start speaking, processes it through a speech recognition model, matches recognized text to predefined command phrases, and executes corresponding actions via simulated keyboard inputs.

Features

Voice Command Recognition: Converts spoken phrases to predefined commands.
Simulated Keyboard Actions: Executes actions such as key presses or hotkey combinations in response to voice commands.
Continuous Listening: Monitors audio input until silence detected.
Customizable Commands: Easily modify command mappings and associated actions.

Requirements

This project relies on several Python libraries. Install them using the following command:

On Macos

python3.11 -m pip install sounddevice numpy whisper scipy pyautogui pyobjc

On Fedora

sudo dnf install python3.11-devel portaudio-devel python3.11-tkinter
python3.11 -m ensurepip
python3.11 -m pip install sounddevice numpy openai-whisper scipy pyautogui pyaudio

Note #1: This script has been developed and tested for macOS systems, especially with regard to its UI element suppression feature. Note #2: If you have a GPU; Whisper gonna lock a certain amount of VRAM. (Take a look at line #2 to force CPU)

Configuration

Commands are configured in a dictionary named COMMAND_MAP. Each command can have multiple trigger phrases associated with it.

Multiple Phrases: Each tuple contains several phrases that trigger the same action.
Action Definition: Actions are defined as lambda functions or regular functions, which perform key presses or combinations using pyautogui.

Usage

Run the script with:

python main.py

Press Ctrl+C to stop the script. The system listens continuously for voice commands and performs actions when recognized phrases match entries in the command map.

Notes

Ensure your microphone is properly configured before starting the script.
Customize COMMAND_MAP as needed to suit your specific requirements.
Commands are executed in a background thread to maintain responsive listening.
You need a GUI; otherwise the pyautogui won't start

Observations

Tiny Model on CPU takes 1 core at 100% and answer fast enough for my use case.
Tiny Model on GPU uses 466MB and is faster than CPU.
Small Model on GPU uses ~1700MB and is better than the tiny and fast enough, but was causing a lot of issue with Star Citizen (Making it unplayeable)

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech to Command

Overview

Features

Requirements

Configuration

Usage

Notes

Observations

License

About

Uh oh!

Languages

License

studiowebux/speech-to-command

Folders and files

Latest commit

History

Repository files navigation

Speech to Command

Overview

Features

Requirements

Configuration

Usage

Notes

Observations

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages