A Python-based speech recognition system that allows you to control your computer using voice commands with push-to-talk functionality. Powered by OpenAI's Whisper model.
- Angela SaadΓ©
- AurΓ©lien Daudin
- Baptiste Arnold
- Khaled Mili
- Maxim Bocquillion
- Push-to-Talk Interface: Record commands only when you want
- Dual Interface: Web UI (Streamlit) and Terminal interface
- CPU-Optimized: Uses lightweight Whisper models (tiny/base)
- Cross-Platform: Works on Windows, macOS, and Linux
- Multiple Commands: Control browser, files, volume, and more
# Clone or download the project files
# Navigate to the project directory
# Install dependencies
pip install -r requirements.txtstreamlit run streamlit_app.pyThen open your browser to http://localhost:8501
python terminal_app.pyspeech-command-system/
βββ speech_recognizer.py # Core recognition engine
βββ streamlit_app.py # Web interface
βββ terminal_app.py # Terminal interface
βββ requirements.txt # Dependencies
βββ README.md # This file
- open browser - Opens your default web browser
- close browser - Closes the browser
- open notepad - Opens text editor (Notepad/TextEdit/gedit)
- open terminal - Opens terminal/command prompt
- list files - Lists files in current directory
- show time - Displays current time
- show date - Displays current date
- create folder - Creates a new folder
- volume up - Increases system volume
- volume down - Decreases system volume
- screenshot - Takes a screenshot (requires pyautogui)
- lock screen - Locks your computer screen
- help - Shows available commands
The system supports different Whisper model sizes:
- tiny (39MB) - Fastest, good for simple commands (recommended for CPU)
- base (74MB) - Balanced speed and accuracy
- small (244MB) - Better accuracy, slower on CPU
Change the model in the code or UI settings.
Adjust recording duration (2-10 seconds) based on your command length.
- Click "Initialize System" in the sidebar
- Press "PUSH TO TALK" button
- Speak your command clearly
- Wait for transcription and execution
- View results and history
- Run the application
- Press ENTER or SPACE to start recording
- Speak your command
- View result
- Press 'h' for help, 'l' for history, 'q' to quit
Problem: No audio recording
# Check your microphone permissions
# Windows: Settings > Privacy > Microphone
# macOS: System Preferences > Security & Privacy > Microphone
# Linux: Check PulseAudio/ALSA settingsProblem: Poor recognition accuracy
- Speak clearly and at a moderate pace
- Ensure low background noise
- Use headset microphone for better quality
- Try the "base" model for better accuracy
Problem: PyTorch installation fails
# Use CPU-only version
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cpuProblem: sounddevice errors
# Linux users might need:
sudo apt-get install portaudio19-dev python3-pyaudioEdit speech_recognizer.py and add to the commands dictionary:
self.commands = {
# ... existing commands ...
"your command": self._your_function,
}
def _your_function(self, text: str) -> str:
# Your command logic here
return "Command executed successfully"The system automatically detects your OS and executes appropriate commands.
- Python: 3.8 or higher
- RAM: 2GB minimum (4GB recommended)
- Storage: 500MB for models and dependencies
- Microphone: Any working microphone
- OS: Windows 10+, macOS 10.14+, or Linux (Ubuntu 20.04+)
- First run downloads the Whisper model (~40-250MB depending on size)
- Some commands require appropriate system permissions
- Volume control commands are platform-specific
- The system works best in quiet environments
- No continuous listening (push-to-talk only)
- Command execution depends on OS permissions
- Some commands may not work on all platforms
- Requires active internet connection for first model download
This project uses OpenAI's Whisper model which is released under MIT License.
Feel free to extend the command set or improve recognition accuracy!
For issues or questions, please refer to:
- Whisper documentation: https://github.com/openai/whisper
- Streamlit docs: https://docs.streamlit.io