Add on-device dictation with Whisper by moehaje · Pull Request #68 · Dimillian/CodexMonitor

moehaje · 2026-01-16T21:17:40Z

What’s in here

Adds on-device dictation in the composer (mic button + waveform).
Adds Settings for dictation (enable toggle, model picker, preferred language).
Models download on demand into app data (no app bundle bloat).
Backend capture + Whisper inference + events wired to the UI.
CMake check + CI installs for more reliable builds.

Why these decisions

I didn’t bundle models: releases stay small and updates stay quick.
Auto-detect stays on; preferred language is just a hint.
The preferred-language hint only kicks in when it’s close:
if best_prob > 0.0 && (best_prob - pref_prob) <= 0.30
We only insert the final transcript on stop (keeps UI stable and predictable).
Added Escape to cancel dictation immediately.
Chose a lightweight waveform (no extra UI libs).

Implementation details

Avoid blocking async Tauri commands (use async primitives instead of sync waits).
Audio callback stays realtime-safe (no heavy work or event emission there).
Level events are throttled (~30ms) so UI stays smooth.
Download progress is throttled (~150ms) to avoid spammy updates.
Download hardening: temp .partial, timeouts, SHA256 verification, atomic rename.
Whisper context cached per model to reduce processing latency.
Transcript insertion adds whitespace heuristics to avoid word-sticking.

Demo

Video:

moehaje-codexmonitor-dictation.mp4

Notes / observations

On my machine, Base felt like the best balance (good English results + reasonable processing time).
Larger models improved accuracy a bit, but processing time was noticeably slower.

How to test

Open Settings → Dictation → Enable dictation.
Pick a model and download it.
Click the mic button in the composer, speak, then stop.
Confirm transcript inserts at cursor and send works.
Press Escape while listening → dictation cancels immediately.

Wire whisper capture, downloads, and Tauri commands.

Expose dictation state to UI and composer.

Add model/language controls and progress UI.

Add cmake checks for dev and CI.

Dimillian · 2026-01-17T06:10:44Z

Review notes (from deep review):

Composer Enter key can still send while dictation is listening/processing (send button is disabled but key handler doesn't check). Suggest short-circuiting when dictation is busy to prevent accidental submits.
Microphone stream errors emit DictationEvent::Error but don't reset dictation session state back to idle; UI may stay stuck in listening. Suggest setting session_state = Idle and emitting state update on error.
Audio capture buffer grows unbounded until stop; long dictations can spike memory/CPU (also cloned on stop). Suggest max duration/size cap or ring buffer/chunked streaming.

Refs:

src/features/composer/components/Composer.tsx (Enter handler)
src-tauri/src/dictation.rs (err_fn in start_capture_thread)
src-tauri/src/dictation.rs (audio buffer extend in build_stream)

moehaje · 2026-01-17T08:16:18Z

you are absolutely right! :) I addressed all three points. fixes on the way!

Prevent Enter submit during dictation.

Stop capture and emit idle state on mic errors.

Limit stored samples to avoid unbounded growth.

Allow configurable modifier hold to start/stop dictation.

moehaje · 2026-01-17T08:48:47Z

another 2 edge-cases fixes on the way!

Avoid error when no transcript is produced.

Let cancel stop processing and suppress transcript.

moehaje · 2026-01-17T09:08:31Z

in case the user starts and stops dictation process, but says nothing, no errors should be thrown/rendered
allowing to hit esacpe to cancel while processing as well, not only listening

Dimillian · 2026-01-17T09:13:35Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2b5c694098

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-01-17T09:18:18Z

src/features/settings/components/SettingsView.tsx

+                    onChange={(event) =>
+                      void onUpdateAppSettings({
+                        ...appSettings,
+                        dictationHoldKey: event.target.value || null,
+                      })


Avoid sending null for dictationHoldKey

Selecting “Off” sets dictationHoldKey to null, but the Tauri settings struct expects a non‑optional String (dictation_hold_key in src-tauri/src/types.rs). When the user saves settings with Off selected, update_app_settings will try to deserialize null into a String and fail, so settings updates will error or revert. This only reproduces when Off is chosen; keeping the field a string sentinel (e.g., "") or making the backend field Option<String> would avoid the mismatch.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-01-17T09:18:18Z

src/App.tsx

+      holdDictationActive.current = false;
+      if (dictationState === "listening") {
+        void stopDictation();
+      }


Stop hold-to-dictate even if state update lags

The keyup handler only calls stopDictation() when dictationState === "listening". If the user taps and releases the hold key quickly, the backend may still report idle at keyup time, so no stop is sent; when the state flips to listening a moment later, dictation continues even though the key is no longer held. This can leave the mic running unexpectedly; consider tracking the key‑up intent and stopping once listening begins, or canceling immediately on keyup regardless of current state.

Useful? React with 👍 / 👎.

Avoid null values in app settings and make hold-to-dictate stop on keyup even if state lags.

Move keyboard handling into a dedicated hook and shared key matcher.

moehaje added 5 commits January 16, 2026 21:41

feat(dictation): add backend capture, model management, and IPC

da58d76

Wire whisper capture, downloads, and Tauri commands.

feat(dictation): add dictation hooks, events, and composer wiring

70e4c17

Expose dictation state to UI and composer.

feat(settings): add dictation settings + download progress UI

9188525

Add model/language controls and progress UI.

chore(build): add doctor + cmake checks

c6db091

Add cmake checks for dev and CI.

Merge branch 'main' into moehaje/dictation

54631af

Merge branch 'main' into moehaje/dictation

db14c2e

moehaje added 4 commits January 17, 2026 09:34

fix(composer): block send while dictation active

6ef93bc

Prevent Enter submit during dictation.

fix(dictation): reset state on stream error

57bff23

Stop capture and emit idle state on mic errors.

fix(dictation): cap capture buffer size

f4c63fd

Limit stored samples to avoid unbounded growth.

feat(dictation): add hold-to-dictate hotkey

43a0491

Allow configurable modifier hold to start/stop dictation.

moehaje added 2 commits January 17, 2026 09:52

fix(dictation): skip no-speech errors

f2753a1

Avoid error when no transcript is produced.

fix(dictation): allow cancel during processing

2b5c694

Let cancel stop processing and suppress transcript.

chatgpt-codex-connector bot reviewed Jan 17, 2026

View reviewed changes

moehaje and others added 5 commits January 17, 2026 10:26

fix(dictation): keep hold key string sentinel

c0e3a9b

Avoid null values in app settings and make hold-to-dictate stop on keyup even if state lags.

refactor(dictation): extract hold-to-dictate hook

0400833

Move keyboard handling into a dedicated hook and shared key matcher.

fix(dictation): align hold hook types

5ddbb9d

Merge branch 'main' into moehaje/dictation

c863e7b

Fixes

4ebe2ed

Dimillian merged commit b8a86b9 into Dimillian:main Jan 17, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add on-device dictation with Whisper#68

Add on-device dictation with Whisper#68
Dimillian merged 17 commits intoDimillian:mainfrom
moehaje:moehaje/dictation

moehaje commented Jan 16, 2026

Uh oh!

Dimillian commented Jan 17, 2026

Uh oh!

moehaje commented Jan 17, 2026

Uh oh!

moehaje commented Jan 17, 2026

Uh oh!

moehaje commented Jan 17, 2026 •

edited

Loading

Uh oh!

Dimillian commented Jan 17, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Jan 17, 2026

Uh oh!

chatgpt-codex-connector bot Jan 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

moehaje commented Jan 16, 2026

What’s in here

Why these decisions

Implementation details

Demo

Notes / observations

How to test

Uh oh!

Dimillian commented Jan 17, 2026

Uh oh!

moehaje commented Jan 17, 2026

Uh oh!

moehaje commented Jan 17, 2026

Uh oh!

moehaje commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dimillian commented Jan 17, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

moehaje commented Jan 17, 2026 •

edited

Loading