Turn your voice into text with a triple-tap — minimal, fast, and macOS-native.
ctrlSPEAK is your set-it-and-forget-it speech-to-text companion. Triple-tap Ctrl
, speak your mind, and watch your words appear wherever your cursor blinks — effortlessly copied and pasted. Built for macOS, it's lightweight, low-overhead, and stays out of your way until you call it.
- 🖥️ Minimal Interface: Runs quietly in the background via the command line
- ⚡ Triple-Tap Magic: Start/stop recording with a quick
Ctrl
triple-tap - 📋 Auto-Paste: Text lands right where you need it, no extra clicks
- 🔊 Audio Cues: Hear when recording begins and ends
- 🍎 Mac Optimized: Harnesses Apple Silicon's MPS for blazing performance
- 🌟 Top-Tier Models: Powered by NVIDIA NeMo and OpenAI Whisper
- System: macOS 12.3+ (MPS acceleration supported)
- Python: 3.10
- Permissions:
- 🎤 Microphone (for recording)
- ⌨️ Accessibility (for shortcuts)
Grant these on first launch and you're good to go!
# Install ctrlSPEAK using Homebrew
brew tap patelnav/ctrlspeak
brew install ctrlspeak
For faster package installation:
# Install with UV support for faster package installation
brew install ctrlspeak --with-uv
Clone the repository:
git clone https://github.com/patelnav/ctrlspeak.git
cd ctrlspeak
Create and activate a virtual environment:
# Create a virtual environment
python -m venv .venv
# Activate it on macOS/Linux
source .venv/bin/activate
Install dependencies (recommended with UV for faster installation):
# Install UV first if you don't have it
pip install uv
# Then install dependencies with UV
uv pip install -r requirements.txt
# Or use traditional pip (slower)
pip install -r requirements.txt
For Whisper model support (optional):
# With UV (recommended)
uv pip install -r requirements-whisper.txt
# Or with traditional pip
pip install -r requirements-whisper.txt
ctrlspeak.py
: The full-featured star of the showlive_transcribe.py
: Continuous transcription for testing vibestest_transcription.py
: Debug or benchmark with ease
- Run ctrlSPEAK in a terminal window:
# If installed with Homebrew ctrlspeak # If installed manually (from the project directory with activated venv) python ctrlspeak.py
- Triple-tap Ctrl to start recording
- Speak clearly into your microphone
- Triple-tap Ctrl again to stop recording
- The transcribed text will be automatically pasted at your cursor position
ctrlSPEAK uses open-source speech recognition models:
- Parakeet 0.6B (default): NVIDIA NeMo's
nvidia/parakeet-tdt-0.6b-v2
model. Good balance of speed, accuracy, punctuation, and capitalization. - Parakeet 1.1B: NVIDIA NeMo's older
nvidia/parakeet-tdt-1.1b
model. Potentially higher accuracy in some cases, but lacks punctuation. - Canary: NVIDIA NeMo's
nvidia/canary-1b
multilingual model (En, De, Fr, Es) with punctuation, but can be slower. - Whisper (optional): OpenAI's
openai/whisper-large-v3
model. A fast, accurate, and powerful model that includes excellent punctuation and capitalization.- To use Whisper, install additional dependencies:
uv pip install -r requirements-whisper.txt
- To use Whisper, install additional dependencies:
The models are automatically downloaded from HuggingFace the first time you use them.
You can specify which model to use with the --model
flag:
# Using Homebrew installation
ctrlspeak --model parakeet-0.6b # Default
ctrlspeak --model parakeet-1.1b # Older, larger Parakeet
ctrlspeak --model canary # Multilingual with punctuation
ctrlspeak --model whisper # OpenAI's model
# Using manual installation
python ctrlspeak.py --model parakeet-0.6b
python ctrlspeak.py --model parakeet-1.1b
python ctrlspeak.py --model canary
python ctrlspeak.py --model whisper
For debugging, you can use the --debug
flag:
ctrlspeak --debug
- Parakeet 0.6B (NVIDIA) -
nvidia/parakeet-tdt-0.6b-v2
(Default) - Parakeet 1.1B (NVIDIA) -
nvidia/parakeet-tdt-1.1b
- Canary (NVIDIA) -
nvidia/canary-1b
- Whisper (OpenAI) -
openai/whisper-large-v3
Model | Load Time | Transcription Time | Transcription Quality | Output Example (test.wav) |
---|---|---|---|---|
Parakeet 0.6B | 5.17s | 0.70s | Good w/ Punct. & Caps. | "Well, I don't wish to see it any more, observed Phebe, turning away her eyes. It is certainly very like the old portrait." |
Parakeet 1.1B | 10.07s | 1.08s | Good, no punctuation | "well i don't wish to see it any more observed phoebe turning away her eyes it is certainly very like the old portrait" |
Canary | 8.15s | 30.82s | Good w/ Punct. & Caps. | "Well, I don't wish to see it any more, observed Phoebe, turning away her eyes. It is certainly very like the old portrait." |
Whisper (large-v3) | 4.0s | 4.5s | Excellent w/ Punct. & Caps. | "Well, I don't wish to see it any more, observed Phoebe, turning away her eyes. It is certainly very like the old portrait." |
Note: Whisper model uses translate mode to enable proper punctuation and capitalization for English transcription.
The app requires:
- Microphone access (for recording audio)
- Accessibility permissions (for global keyboard shortcuts)
You'll be prompted to grant these permissions on first run.
- No sound on recording start/stop: Ensure your system volume is not muted
- Keyboard shortcuts not working: Grant accessibility permissions in System Settings
- Transcription errors: Try speaking more clearly or using the other model
- Start sound: "Notification Pluck On" from Pixabay
- Stop sound: "Notification Pluck Off" from Pixabay
This outlines the steps to create a new release and update the associated Homebrew tap.
1. Prepare the Release:
- Ensure the code is stable and tests pass.
- Update the version number in the following files:
VERSION
(e.g.,1.2.0
)__init__.py
(__version__ = "1.2.0"
)pyproject.toml
(version = "1.2.0"
)
- Commit these version changes:
git add VERSION __init__.py pyproject.toml git commit -m "Bump version to X.Y.Z"
2. Tag and Push:
- Create a git tag matching the version:
git tag vX.Y.Z
- Push the commits and the tag to the remote repository:
git push && git push origin vX.Y.Z
3. Update Homebrew Tap:
- The source code tarball URL is automatically generated based on the tag (usually
https://github.com/<your-username>/ctrlspeak/archive/refs/tags/vX.Y.Z.tar.gz
). - Download the tarball using its URL and calculate its SHA256 checksum:
# Replace URL with the actual tarball link based on the tag curl -sL https://github.com/<your-username>/ctrlspeak/archive/refs/tags/vX.Y.Z.tar.gz | shasum -a 256
- Clone or navigate to your Homebrew tap repository (e.g.,
../homebrew-ctrlspeak
). - Edit the formula file (e.g.,
Formula/ctrlspeak.rb
):- Update the
url
line with the tag tarball URL. - Update the
sha256
line with the checksum you calculated. - Optional: Update the
version
line if necessary (though it's often inferred). - Optional: If
requirements.txt
or dependencies changed, update thedepends_on
andinstall
steps accordingly.
- Update the
- Commit and push the changes in the tap repository:
cd ../path/to/homebrew-ctrlspeak # Or wherever your tap repo is git add Formula/ctrlspeak.rb git commit -m "Update ctrlspeak to vX.Y.Z" git push
4. Verify (Optional):
- Run
brew update
locally to fetch the updated formula. - Run
brew upgrade ctrlspeak
to install the new version. - Test the installed version.