Local, push-to-talk speech-to-text for macOS. Hold a hotkey, speak, release — the transcript pastes into whatever text field is focused. Everything runs on-device via Apple Neural Engine. Audio never leaves your Mac.
Apple Silicon + macOS 14 (Sonoma) or later required.
Download the latest Uhura.dmg from Releases, open it, and drag Uhura to Applications.
Because Uhura is ad-hoc signed and not notarized, macOS may say the app is damaged on first launch. If so, run once in Terminal:
xattr -cr /Applications/Uhura.appFirst launch: Uhura downloads the Whisper small.en model (~460 MB) from Hugging Face on first run. A progress indicator appears in the menu bar. Subsequent launches load in 1–3 s.
- Launch Uhura — the onboarding window opens automatically.
- Grant Microphone access (system prompt).
- Grant Accessibility access — click the button, toggle Uhura on in System Settings, and return. The app detects the grant within 1 s; no restart needed.
- Record your push-to-talk hotkey (any key or chord).
- Click Continue.
- Focus any text field in any app.
- Hold your hotkey and speak.
- A floating HUD shows a live waveform and partial transcript while you hold.
- Release — the final transcript pastes into the focused field via Cmd+V.
Menu bar icon → Settings…
| Setting | Default | Notes |
|---|---|---|
| Input device | System default | USB and Bluetooth mics appear when connected |
| Push-to-talk hotkey | (unset) | Set in onboarding or here |
| Whisper model | small.en (~460 MB) |
Switching requires a restart |
| Start/stop sounds | On | Subtle Tink/Pop chimes |
| Use Unicode keystrokes | Off | Slower fallback that avoids the clipboard; use if Cmd+V paste fails in a specific app |
Requirements: Xcode 16+, XcodeGen
brew install xcodegen
git clone https://github.com/jcleigh/uhura
cd uhura
xcodegen generate
open Uhura.xcodeprojThe first build resolves SPM packages (WhisperKit + KeyboardShortcuts) — allow a couple of minutes.
To run tests: ⌘U in Xcode, or:
xcodebuild test -project Uhura.xcodeproj -scheme Uhura -destination 'platform=macOS'Push a version tag and GitHub Actions builds and publishes the DMG automatically:
git tag v1.0.0
git push origin v1.0.0The workflow installs XcodeGen, generates the project, resolves packages, builds a Release binary, ad-hoc signs it, packages Uhura.dmg and Uhura.zip, and creates a GitHub Release. See .github/workflows/build-release.yml.
Audio pipeline — WhisperKit's AudioProcessor owns AVAudioEngine capture, 48 kHz → 16 kHz Float32 resampling, sample storage, and relativeEnergy: [Float] for the waveform. The app calls startRecordingLive and stopRecording directly rather than going through AudioStreamTranscriber, which is a Swift actor whose executor has no run loop and causes AVAudioEngine.start() to fail with -10877 (kAudioHardwareBadStreamError). All AVAudioEngine calls are made on @MainActor (main thread), which has a run loop.
Dual-pass transcription — while recording, a detached Task calls whisperKit.transcribe(audioArray:) every 800 ms (after ≥ 1 s of new audio accumulates) to drive the HUD partial text. On key-up, stopAndFinalize() runs a single final transcribe pass over the full captured buffer for the cleanest paste result. Streaming = liveness; single-shot = accuracy.
Paste injection — NSPasteboard snapshot → set transcript → 20 ms settle → synthesized Cmd+V via CGEvent on .cghidEventTap → 150 ms → restore original clipboard. .cghidEventTap (not .cgSessionEventTap) is required for reliable delivery into Electron/Chromium apps.
Permissions — Accessibility is detected by polling AXIsProcessTrusted() at 1 Hz. macOS 13+ applies the grant without an app restart. Microphone permission uses AVAudioApplication.requestRecordPermission() on macOS 14+ (not AVCaptureDevice), which both grants TCC and primes the coreaudiod XPC session needed by AVAudioEngine.
Settings persistence — @Observable only tracks stored properties. Settings.swift uses stored properties initialized from UserDefaults with didSet persistence rather than computed properties, ensuring SwiftUI observation picks up every change.
Uhura/
├── App/
│ ├── UhuraApp.swift @main, MenuBarExtra + Settings scene
│ ├── AppDelegate.swift NSApplicationDelegate, owns AppCoordinator
│ └── AppCoordinator.swift wires hotkey → engine → HUD → paste
├── Capture/
│ ├── DictationEngine.swift owns WhisperKit, audio capture, transcription
│ └── Settings.swift UserDefaults-backed preferences (@Observable)
├── Hotkey/
│ └── HotkeyController.swift KeyboardShortcuts onKeyDown/onKeyUp
├── Injection/
│ ├── PasteInjector.swift clipboard save → set → Cmd+V → restore
│ └── AccessibilityPermission.swift AXIsProcessTrusted + 1 Hz polling
├── HUD/
│ ├── HUDWindowController.swift NSPanel, .floating, .canJoinAllSpaces
│ ├── HUDView.swift SwiftUI waveform + partial text overlay
│ └── WaveformView.swift bar viz of AudioProcessor.relativeEnergy
├── MenuBar/
│ ├── MenuBarIcon.swift custom Uhura badge template image + state badges
│ └── SettingsScene.swift SwiftUI Settings + Onboarding views
├── Audio/
│ └── Pings.swift NSSound start/stop chimes
└── Resources/
└── Assets.xcassets/ AppIcon (all sizes), MenuBarIcon template
- Not sandboxed — cannot ship to the Mac App Store. Developer ID + notarization is the supported distribution path.
- Memory:
small.enruns ~800 MB–1.2 GB resident. Usetiny.enorbase.enfor lower-memory machines (configurable in Settings). - Single-modifier hotkeys (e.g. Right Option alone) require macOS 15.2+ and KeyboardShortcuts v2.1.0+. Use a chord on older systems.
- Apple secure text fields and password manager master password prompts intentionally block synthesized paste — clipboard is still restored cleanly.
- Sub-250 ms keypresses are silently dropped — Whisper hallucinates on near-empty audio.
MIT — see LICENSE.