🎤 Speech Command Recognition System

A Python-based speech recognition system that allows you to control your computer using voice commands with push-to-talk functionality. Powered by OpenAI's Whisper model.

✍🏻 Authors

Angela Saadé
Aurélien Daudin
Baptiste Arnold
Khaled Mili
Maxim Bocquillion

📋 Features

Push-to-Talk Interface: Record commands only when you want
Dual Interface: Web UI (Streamlit) and Terminal interface
CPU-Optimized: Uses lightweight Whisper models (tiny/base)
Cross-Platform: Works on Windows, macOS, and Linux
Multiple Commands: Control browser, files, volume, and more

🚀 Quick Start

1. Installation

# Clone or download the project files
# Navigate to the project directory

# Install dependencies
pip install -r requirements.txt

2. Running the Application

Option A: Web Interface (Recommended)

streamlit run streamlit_app.py

Then open your browser to http://localhost:8501

Option B: Terminal Interface

python terminal_app.py

📁 Project Structure

speech-command-system/
├── speech_recognizer.py    # Core recognition engine
├── streamlit_app.py         # Web interface
├── terminal_app.py          # Terminal interface
├── requirements.txt         # Dependencies
└── README.md               # This file

🎯 Available Voice Commands

open browser - Opens your default web browser
close browser - Closes the browser
open notepad - Opens text editor (Notepad/TextEdit/gedit)
open terminal - Opens terminal/command prompt
list files - Lists files in current directory
show time - Displays current time
show date - Displays current date
create folder - Creates a new folder
volume up - Increases system volume
volume down - Decreases system volume
screenshot - Takes a screenshot (requires pyautogui)
lock screen - Locks your computer screen
help - Shows available commands

🔧 Configuration

Model Selection

The system supports different Whisper model sizes:

tiny (39MB) - Fastest, good for simple commands (recommended for CPU)
base (74MB) - Balanced speed and accuracy
small (244MB) - Better accuracy, slower on CPU

Change the model in the code or UI settings.

Recording Duration

Adjust recording duration (2-10 seconds) based on your command length.

💡 Usage Tips

Web Interface (Streamlit)

Click "Initialize System" in the sidebar
Press "PUSH TO TALK" button
Speak your command clearly
Wait for transcription and execution
View results and history

Terminal Interface

Run the application
Press ENTER or SPACE to start recording
Speak your command
View result
Press 'h' for help, 'l' for history, 'q' to quit

🔍 Troubleshooting

Audio Issues

Problem: No audio recording

# Check your microphone permissions
# Windows: Settings > Privacy > Microphone
# macOS: System Preferences > Security & Privacy > Microphone
# Linux: Check PulseAudio/ALSA settings

Problem: Poor recognition accuracy

Speak clearly and at a moderate pace
Ensure low background noise
Use headset microphone for better quality
Try the "base" model for better accuracy

Installation Issues

Problem: PyTorch installation fails

# Use CPU-only version
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cpu

Problem: sounddevice errors

# Linux users might need:
sudo apt-get install portaudio19-dev python3-pyaudio

🎨 Customization

Adding New Commands

Edit speech_recognizer.py and add to the commands dictionary:

self.commands = {
    # ... existing commands ...
    "your command": self._your_function,
}

def _your_function(self, text: str) -> str:
    # Your command logic here
    return "Command executed successfully"

Platform-Specific Commands

The system automatically detects your OS and executes appropriate commands.

⚙️ System Requirements

Python: 3.8 or higher
RAM: 2GB minimum (4GB recommended)
Storage: 500MB for models and dependencies
Microphone: Any working microphone
OS: Windows 10+, macOS 10.14+, or Linux (Ubuntu 20.04+)

📝 Notes

First run downloads the Whisper model (~40-250MB depending on size)
Some commands require appropriate system permissions
Volume control commands are platform-specific
The system works best in quiet environments

🐛 Known Limitations

No continuous listening (push-to-talk only)
Command execution depends on OS permissions
Some commands may not work on all platforms
Requires active internet connection for first model download

📄 License

This project uses OpenAI's Whisper model which is released under MIT License.

🤝 Contributing

Feel free to extend the command set or improve recognition accuracy!

📧 Support

For issues or questions, please refer to:

Whisper documentation: https://github.com/openai/whisper
Streamlit docs: https://docs.streamlit.io

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎤 Speech Command Recognition System

✍🏻 Authors

📋 Features

🚀 Quick Start

1. Installation

2. Running the Application

Option A: Web Interface (Recommended)

Option B: Terminal Interface

📁 Project Structure

🎯 Available Voice Commands

🔧 Configuration

Model Selection

Recording Duration

💡 Usage Tips

Web Interface (Streamlit)

Terminal Interface

🔍 Troubleshooting

Audio Issues

Installation Issues

🎨 Customization

Adding New Commands

Platform-Specific Commands

⚙️ System Requirements

📝 Notes

🐛 Known Limitations

📄 License

🤝 Contributing

📧 Support

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
speech_recognizer.py		speech_recognizer.py
streamlit_app.py		streamlit_app.py
terminal_app.py		terminal_app.py

maximboc/speech-recognition

Folders and files

Latest commit

History

Repository files navigation

🎤 Speech Command Recognition System

✍🏻 Authors

📋 Features

🚀 Quick Start

1. Installation

2. Running the Application

Option A: Web Interface (Recommended)

Option B: Terminal Interface

📁 Project Structure

🎯 Available Voice Commands

🔧 Configuration

Model Selection

Recording Duration

💡 Usage Tips

Web Interface (Streamlit)

Terminal Interface

🔍 Troubleshooting

Audio Issues

Installation Issues

🎨 Customization

Adding New Commands

Platform-Specific Commands

⚙️ System Requirements

📝 Notes

🐛 Known Limitations

📄 License

🤝 Contributing

📧 Support

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages