🎙️ ctrlSPEAK

Turn your voice into text with a triple-tap — minimal, fast, and macOS-native.

🚀 Overview

ctrlSPEAK is your set-it-and-forget-it speech-to-text companion. Triple-tap Ctrl, speak your mind, and watch your words appear wherever your cursor blinks — effortlessly copied and pasted. Built for macOS, it's lightweight, low-overhead, and stays out of your way until you call it.

✨ Features

🖥️ Minimal Interface: Runs quietly in the background via the command line
⚡ Triple-Tap Magic: Start/stop recording with a quick Ctrl triple-tap
📋 Auto-Paste: Text lands right where you need it, no extra clicks
🔊 Audio Cues: Hear when recording begins and ends
🍎 Mac Optimized: Harnesses Apple Silicon's MPS for blazing performance
🌟 Top-Tier Models: Powered by NVIDIA NeMo and OpenAI Whisper

🛠️ Get Started

System: macOS 12.3+ (MPS acceleration supported)
Python: 3.10
Permissions:
- 🎤 Microphone (for recording)
- ⌨️ Accessibility (for shortcuts)
  Grant these on first launch and you're good to go!

📦 Installation

Using Homebrew (Recommended)

# Install ctrlSPEAK using Homebrew
brew tap patelnav/ctrlspeak
brew install ctrlspeak

For faster package installation:

# Install with UV support for faster package installation
brew install ctrlspeak --with-uv

Manual Installation

Clone the repository:

git clone https://github.com/patelnav/ctrlspeak.git
cd ctrlspeak

Create and activate a virtual environment:

# Create a virtual environment
python -m venv .venv

# Activate it on macOS/Linux
source .venv/bin/activate

Install dependencies (recommended with UV for faster installation):

# Install UV first if you don't have it
pip install uv

# Then install dependencies with UV
uv pip install -r requirements.txt

# Or use traditional pip (slower)
pip install -r requirements.txt

For Whisper model support (optional):

# With UV (recommended)
uv pip install -r requirements-whisper.txt

# Or with traditional pip
pip install -r requirements-whisper.txt

🧰 Entry Points

ctrlspeak.py: The full-featured star of the show
live_transcribe.py: Continuous transcription for testing vibes
test_transcription.py: Debug or benchmark with ease

Workflow

Run ctrlSPEAK in a terminal window:

# If installed with Homebrew
ctrlspeak

# If installed manually (from the project directory with activated venv)
python ctrlspeak.py

Triple-tap Ctrl to start recording
Speak clearly into your microphone
Triple-tap Ctrl again to stop recording
The transcribed text will be automatically pasted at your cursor position

Models

ctrlSPEAK uses open-source speech recognition models:

Parakeet 0.6B (default): NVIDIA NeMo's nvidia/parakeet-tdt-0.6b-v2 model. Good balance of speed, accuracy, punctuation, and capitalization.
Parakeet 1.1B: NVIDIA NeMo's older nvidia/parakeet-tdt-1.1b model. Potentially higher accuracy in some cases, but lacks punctuation.
Canary: NVIDIA NeMo's nvidia/canary-1b multilingual model (En, De, Fr, Es) with punctuation, but can be slower.
Whisper (optional): OpenAI's openai/whisper-large-v3 model. A fast, accurate, and powerful model that includes excellent punctuation and capitalization.
- To use Whisper, install additional dependencies: uv pip install -r requirements-whisper.txt

The models are automatically downloaded from HuggingFace the first time you use them.

Model Selection

You can specify which model to use with the --model flag:

# Using Homebrew installation
ctrlspeak --model parakeet-0.6b  # Default
ctrlspeak --model parakeet-1.1b  # Older, larger Parakeet
ctrlspeak --model canary         # Multilingual with punctuation
ctrlspeak --model whisper        # OpenAI's model

# Using manual installation
python ctrlspeak.py --model parakeet-0.6b
python ctrlspeak.py --model parakeet-1.1b
python ctrlspeak.py --model canary
python ctrlspeak.py --model whisper

For debugging, you can use the --debug flag:

ctrlspeak --debug

Models Tested

Parakeet 0.6B (NVIDIA) - nvidia/parakeet-tdt-0.6b-v2 (Default)
Parakeet 1.1B (NVIDIA) - nvidia/parakeet-tdt-1.1b
Canary (NVIDIA) - nvidia/canary-1b
Whisper (OpenAI) - openai/whisper-large-v3

Performance Comparison

Model	Load Time	Transcription Time	Transcription Quality	Output Example (test.wav)
Parakeet 0.6B	5.17s	0.70s	Good w/ Punct. & Caps.	"Well, I don't wish to see it any more, observed Phebe, turning away her eyes. It is certainly very like the old portrait."
Parakeet 1.1B	10.07s	1.08s	Good, no punctuation	"well i don't wish to see it any more observed phoebe turning away her eyes it is certainly very like the old portrait"
Canary	8.15s	30.82s	Good w/ Punct. & Caps.	"Well, I don't wish to see it any more, observed Phoebe, turning away her eyes. It is certainly very like the old portrait."
Whisper (large-v3)	4.0s	4.5s	Excellent w/ Punct. & Caps.	"Well, I don't wish to see it any more, observed Phoebe, turning away her eyes. It is certainly very like the old portrait."

Note: Whisper model uses translate mode to enable proper punctuation and capitalization for English transcription.

Permissions

The app requires:

Microphone access (for recording audio)
Accessibility permissions (for global keyboard shortcuts)

You'll be prompted to grant these permissions on first run.

Troubleshooting

No sound on recording start/stop: Ensure your system volume is not muted
Keyboard shortcuts not working: Grant accessibility permissions in System Settings
Transcription errors: Try speaking more clearly or using the other model

Credits

Sound Effects

Start sound: "Notification Pluck On" from Pixabay
Stop sound: "Notification Pluck Off" from Pixabay

License

MIT License

Release Process

This outlines the steps to create a new release and update the associated Homebrew tap.

1. Prepare the Release:

Ensure the code is stable and tests pass.
Update the version number in the following files:
- VERSION (e.g., 1.2.0)
- __init__.py (__version__ = "1.2.0")
- pyproject.toml (version = "1.2.0")

Commit these version changes:

git add VERSION __init__.py pyproject.toml
git commit -m "Bump version to X.Y.Z"

2. Tag and Push:

Create a git tag matching the version:
```
git tag vX.Y.Z
```
Push the commits and the tag to the remote repository:
```
git push && git push origin vX.Y.Z
```

3. Update Homebrew Tap:

The source code tarball URL is automatically generated based on the tag (usually https://github.com/<your-username>/ctrlspeak/archive/refs/tags/vX.Y.Z.tar.gz).

Download the tarball using its URL and calculate its SHA256 checksum:

# Replace URL with the actual tarball link based on the tag
curl -sL https://github.com/<your-username>/ctrlspeak/archive/refs/tags/vX.Y.Z.tar.gz | shasum -a 256

Clone or navigate to your Homebrew tap repository (e.g., ../homebrew-ctrlspeak).
Edit the formula file (e.g., Formula/ctrlspeak.rb):
- Update the url line with the tag tarball URL.
- Update the sha256 line with the checksum you calculated.
- Optional: Update the version line if necessary (though it's often inferred).
- Optional: If requirements.txt or dependencies changed, update the depends_on and install steps accordingly.

Commit and push the changes in the tap repository:

cd ../path/to/homebrew-ctrlspeak # Or wherever your tap repo is
git add Formula/ctrlspeak.rb
git commit -m "Update ctrlspeak to vX.Y.Z"
git push

4. Verify (Optional):

Run brew update locally to fetch the updated formula.
Run brew upgrade ctrlspeak to install the new version.
Test the installed version.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
models		models
tests		tests
utils		utils
.cursorrules		.cursorrules
.gitignore		.gitignore
.gitmodules		.gitmodules
.python-version		.python-version
PLAN.md		PLAN.md
README.md		README.md
VERSION		VERSION
__init__.py		__init__.py
__main__.py		__main__.py
ctrlspeak-demo.gif		ctrlspeak-demo.gif
ctrlspeak.py		ctrlspeak.py
live_transcribe.py		live_transcribe.py
off.mp3		off.mp3
off.wav		off.wav
on.mp3		on.mp3
on.wav		on.wav
pyproject.toml		pyproject.toml
requirements-whisper.txt		requirements-whisper.txt
requirements.txt		requirements.txt
test.wav		test.wav
test_permissions.py		test_permissions.py
test_transcription.py		test_transcription.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎙️ ctrlSPEAK

🚀 Overview

✨ Features

🛠️ Get Started

📦 Installation

Using Homebrew (Recommended)

Manual Installation

🧰 Entry Points

Workflow

Models

Model Selection

Models Tested

Performance Comparison

Permissions

Troubleshooting

Credits

Sound Effects

License

Release Process

About

Uh oh!

Releases

Packages

Uh oh!

Languages

patelnav/ctrlspeak

Folders and files

Latest commit

History

Repository files navigation

🎙️ ctrlSPEAK

🚀 Overview

✨ Features

🛠️ Get Started

📦 Installation

Using Homebrew (Recommended)

Manual Installation

🧰 Entry Points

Workflow

Models

Model Selection

Models Tested

Performance Comparison

Permissions

Troubleshooting

Credits

Sound Effects

License

Release Process

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages