Skip to content

studiowebux/speech-to-command

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Speech to Command

Overview

The Speech-to-Command script converts spoken words into keyboard commands using speech recognition. It captures audio from your microphone, detects when you start speaking, processes it through a speech recognition model, matches recognized text to predefined command phrases, and executes corresponding actions via simulated keyboard inputs.

Features

  • Voice Command Recognition: Converts spoken phrases to predefined commands.
  • Simulated Keyboard Actions: Executes actions such as key presses or hotkey combinations in response to voice commands.
  • Continuous Listening: Monitors audio input until silence detected.
  • Customizable Commands: Easily modify command mappings and associated actions.

Requirements

This project relies on several Python libraries. Install them using the following command:

On Macos

python3.11 -m pip install sounddevice numpy whisper scipy pyautogui pyobjc

On Fedora

sudo dnf install python3.11-devel portaudio-devel python3.11-tkinter
python3.11 -m ensurepip
python3.11 -m pip install sounddevice numpy openai-whisper scipy pyautogui pyaudio

Note #1: This script has been developed and tested for macOS systems, especially with regard to its UI element suppression feature. Note #2: If you have a GPU; Whisper gonna lock a certain amount of VRAM. (Take a look at line #2 to force CPU)

Configuration

Commands are configured in a dictionary named COMMAND_MAP. Each command can have multiple trigger phrases associated with it.

  • Multiple Phrases: Each tuple contains several phrases that trigger the same action.
  • Action Definition: Actions are defined as lambda functions or regular functions, which perform key presses or combinations using pyautogui.

Usage

Run the script with:

python main.py

Press Ctrl+C to stop the script. The system listens continuously for voice commands and performs actions when recognized phrases match entries in the command map.

Notes

  • Ensure your microphone is properly configured before starting the script.
  • Customize COMMAND_MAP as needed to suit your specific requirements.
  • Commands are executed in a background thread to maintain responsive listening.
  • You need a GUI; otherwise the pyautogui won't start

Observations

  • Tiny Model on CPU takes 1 core at 100% and answer fast enough for my use case.
  • Tiny Model on GPU uses 466MB and is faster than CPU.
  • Small Model on GPU uses ~1700MB and is better than the tiny and fast enough, but was causing a lot of issue with Star Citizen (Making it unplayeable)

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

Speech to Text - Using Whisper and pyautogui to control the machine

Topics

Resources

License

Stars

Watchers

Forks

Languages