Skip to content
This repository was archived by the owner on Dec 18, 2025. It is now read-only.

A Python-based speech recognition system that allows you to control your computer using voice commands with push-to-talk functionality. Powered by OpenAI's Whisper model πŸ—£οΈ

Notifications You must be signed in to change notification settings

maximboc/speech-recognition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎀 Speech Command Recognition System

A Python-based speech recognition system that allows you to control your computer using voice commands with push-to-talk functionality. Powered by OpenAI's Whisper model.

✍🏻 Authors

  • Angela SaadΓ©
  • AurΓ©lien Daudin
  • Baptiste Arnold
  • Khaled Mili
  • Maxim Bocquillion

πŸ“‹ Features

  • Push-to-Talk Interface: Record commands only when you want
  • Dual Interface: Web UI (Streamlit) and Terminal interface
  • CPU-Optimized: Uses lightweight Whisper models (tiny/base)
  • Cross-Platform: Works on Windows, macOS, and Linux
  • Multiple Commands: Control browser, files, volume, and more

πŸš€ Quick Start

1. Installation

# Clone or download the project files
# Navigate to the project directory

# Install dependencies
pip install -r requirements.txt

2. Running the Application

Option A: Web Interface (Recommended)

streamlit run streamlit_app.py

Then open your browser to http://localhost:8501

Option B: Terminal Interface

python terminal_app.py

πŸ“ Project Structure

speech-command-system/
β”œβ”€β”€ speech_recognizer.py    # Core recognition engine
β”œβ”€β”€ streamlit_app.py         # Web interface
β”œβ”€β”€ terminal_app.py          # Terminal interface
β”œβ”€β”€ requirements.txt         # Dependencies
└── README.md               # This file

🎯 Available Voice Commands

  • open browser - Opens your default web browser
  • close browser - Closes the browser
  • open notepad - Opens text editor (Notepad/TextEdit/gedit)
  • open terminal - Opens terminal/command prompt
  • list files - Lists files in current directory
  • show time - Displays current time
  • show date - Displays current date
  • create folder - Creates a new folder
  • volume up - Increases system volume
  • volume down - Decreases system volume
  • screenshot - Takes a screenshot (requires pyautogui)
  • lock screen - Locks your computer screen
  • help - Shows available commands

πŸ”§ Configuration

Model Selection

The system supports different Whisper model sizes:

  • tiny (39MB) - Fastest, good for simple commands (recommended for CPU)
  • base (74MB) - Balanced speed and accuracy
  • small (244MB) - Better accuracy, slower on CPU

Change the model in the code or UI settings.

Recording Duration

Adjust recording duration (2-10 seconds) based on your command length.

πŸ’‘ Usage Tips

Web Interface (Streamlit)

  1. Click "Initialize System" in the sidebar
  2. Press "PUSH TO TALK" button
  3. Speak your command clearly
  4. Wait for transcription and execution
  5. View results and history

Terminal Interface

  1. Run the application
  2. Press ENTER or SPACE to start recording
  3. Speak your command
  4. View result
  5. Press 'h' for help, 'l' for history, 'q' to quit

πŸ” Troubleshooting

Audio Issues

Problem: No audio recording

# Check your microphone permissions
# Windows: Settings > Privacy > Microphone
# macOS: System Preferences > Security & Privacy > Microphone
# Linux: Check PulseAudio/ALSA settings

Problem: Poor recognition accuracy

  • Speak clearly and at a moderate pace
  • Ensure low background noise
  • Use headset microphone for better quality
  • Try the "base" model for better accuracy

Installation Issues

Problem: PyTorch installation fails

# Use CPU-only version
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cpu

Problem: sounddevice errors

# Linux users might need:
sudo apt-get install portaudio19-dev python3-pyaudio

🎨 Customization

Adding New Commands

Edit speech_recognizer.py and add to the commands dictionary:

self.commands = {
    # ... existing commands ...
    "your command": self._your_function,
}

def _your_function(self, text: str) -> str:
    # Your command logic here
    return "Command executed successfully"

Platform-Specific Commands

The system automatically detects your OS and executes appropriate commands.

βš™οΈ System Requirements

  • Python: 3.8 or higher
  • RAM: 2GB minimum (4GB recommended)
  • Storage: 500MB for models and dependencies
  • Microphone: Any working microphone
  • OS: Windows 10+, macOS 10.14+, or Linux (Ubuntu 20.04+)

πŸ“ Notes

  • First run downloads the Whisper model (~40-250MB depending on size)
  • Some commands require appropriate system permissions
  • Volume control commands are platform-specific
  • The system works best in quiet environments

πŸ› Known Limitations

  • No continuous listening (push-to-talk only)
  • Command execution depends on OS permissions
  • Some commands may not work on all platforms
  • Requires active internet connection for first model download

πŸ“„ License

This project uses OpenAI's Whisper model which is released under MIT License.

🀝 Contributing

Feel free to extend the command set or improve recognition accuracy!

πŸ“§ Support

For issues or questions, please refer to:

About

A Python-based speech recognition system that allows you to control your computer using voice commands with push-to-talk functionality. Powered by OpenAI's Whisper model πŸ—£οΈ

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages