Initial speaker assignment support #1624

yujonglee · 2025-11-04T08:59:45Z

No description provided.

coderabbitai · 2025-11-04T09:00:16Z

📝 Walkthrough

Walkthrough

This pull request adds speaker assignment functionality to the transcript editor, refactors shared operations type definitions, implements multi-mode audio channel support in the listener actor pipeline, and extends stream response handling with offset/metadata methods. Changes span desktop transcript UI components, utility functions for segment building and speaker hints, and Rust audio processing actors.

Changes

Cohort / File(s)	Summary
Transcript Speaker Assignment `apps/desktop/src/components/main/body/sessions/note-input/transcript/editor.tsx`	Introduces `handleAssignSpeaker` callback to create speaker hint entries with user-assigned speaker data, validates store state, and registers new checkpoint.
Transcript Component Initialization `apps/desktop/src/components/main/body/sessions/note-input/transcript/index.tsx`	Adds inactive state detection and conditionally renders EditingControls based on session status.
Shared Operations Type Refactor `apps/desktop/src/components/main/body/sessions/note-input/transcript/shared/index.tsx`	Replaces local `WordOperations` type with centralized `Operations` type, updates component signatures, and changes TranscriptSeparator text decoration.
Operations Type Definition `apps/desktop/src/components/main/body/sessions/note-input/transcript/shared/operations.tsx`	Defines new `Operations` type with `onDeleteWord` and `onAssignSpeaker` callbacks.
Segment Header Speaker Assignment UI `apps/desktop/src/components/main/body/sessions/note-input/transcript/shared/segment-header.tsx`	Adds operations prop, renders ContextMenu with Assign Speaker submenu when in editor mode, refactors color/label hooks, and enables speaker selection.
Segment Building Logic `apps/desktop/src/utils/segment.ts`	Introduces `ChannelProfile` enum, refactors speaker tracking with `SpeakerState`, adds `SegmentWord` type, reworks segment construction with stateful identity resolution and normalization pipeline.
Segment Tests `apps/desktop/src/utils/segment.test.ts`	Adds comprehensive test cases for speaker hint propagation, human_id inference, segment splitting on speaker changes, and multi-channel segmentation.
Speaker Hints Runtime Conversion `apps/desktop/src/utils/speaker-hints.ts`	Adds `user_speaker_assignment` hint type handling in `convertStorageHintsToRuntime`, changes `parseProviderSpeakerIndex` visibility to private.
Seed Data `apps/desktop/src/devtool/seed/data/curated.json`	Adds two new curated entries with transcript segments.
Audio Stream Response Methods `owhisper/owhisper-interface/src/stream.rs`	Adds `apply_offset`, `set_extra`, and `remap_channel_index` methods to `StreamResponse`.
oWhisper Client Export `owhisper/owhisper-client/src/lib.rs`	Re-exports `hypr_ws` crate.
Multi-Mode Listener Actors `plugins/listener/src/actors/listener.rs`	Adds `ChannelMode` support with `ChangeMode` message, expands `ListenerArgs` with mode and session timing, applies offset/extra to responses, and branches RX spawn logic on mode.
Channel Mode Enum `plugins/listener/src/actors/mod.rs`	Defines new public `ChannelMode` enum with `Single` and `Dual` variants.
Mode-Aware Audio Processing `plugins/listener/src/actors/processor.rs`	Replaces `Mixed` variant with `SetMode` and `Reset` messages, adds mode field to `ProcState`, implements mode-conditional audio mixing in `process_ready`, and introduces silence caching in `Joiner`.
Session Timing & Mode Discovery `plugins/listener/src/actors/session.rs`	Adds timing fields (`started_at_instant`, `started_at_system`), queries `SourceActor` for mode, and passes mode and timestamps to `ListenerArgs`.
Source Mode Computation & Propagation `plugins/listener/src/actors/source.rs`	Adds `GetMode` message, mode field to `SourceState`, computes platform-specific mode logic, and propagates mode changes to `ProcessorActor` and `ListenerActor`.

Sequence Diagrams

sequenceDiagram
    participant User as User UI
    participant Editor as TranscriptEditor
    participant Header as SegmentHeader
    participant Menu as ContextMenu
    participant Store as Store
    participant Callback as onAssignSpeaker

    User->>Header: Hover segment (editor mode)
    Header->>Menu: Render Assign Speaker menu
    Menu->>Store: Fetch available humans
    User->>Menu: Select speaker
    Menu->>Header: handleAssignSpeaker(wordIds, humanId)
    Header->>Callback: Call onAssignSpeaker
    Callback->>Store: Create speaker_hints entries
    Callback->>Store: Register checkpoint

sequenceDiagram
    participant Source as SourceActor
    participant Listener as ListenerActor
    participant Processor as ProcessorActor
    participant Joiner as Joiner
    participant Stream as StreamResponse

    Source->>Source: Compute new_mode (platform-specific)
    Source->>Processor: ProcMsg::SetMode(new_mode)
    Source->>Listener: ListenerMsg::ChangeMode(new_mode)
    Processor->>Processor: reset_pipeline()
    Listener->>Listener: Spawn new RX task with mode
    Listener->>Joiner: pop_pair(mode)
    alt mode == Single
        Joiner->>Joiner: Mix mic + spk
    else mode == Dual
        Joiner->>Joiner: Return separate mic/spk
    end
    Listener->>Stream: apply_offset()
    Listener->>Stream: set_extra(started_unix_secs)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Segment building refactor (segment.ts): Significant state machine rewrite with speaker identity resolution, normalization pipeline, and complex segmentation logic requires careful verification of correctness across edge cases.
Multi-mode audio pipeline: Substantial refactor across five actor files (listener.rs, processor.rs, source.rs, session.rs, mod.rs) introducing mode-conditional branching, channel remapping, and state management changes.
Speaker assignment feature: New callbacks and UI integration across transcript components with store interactions and checkpoint registration.
Type system changes: ChannelProfile enum replacing numeric channels affects multiple function signatures and requires tracing impact throughout the codebase.

Possibly related PRs

Refactor session actor #1485: Modifies same listener/actor modules (processor, source, listener, session) with overlapping audio pipeline changes.
Migrate to actor for audio processing #1457: Updates listener actor-based pipeline with mode/audio handling and shared ProcMsg/SessionMsg API changes.
Mixed audio #1471: Extends listener/audio pipeline with single vs. dual mode support and ChannelMode/ProcMsg variant additions.

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	❓ Inconclusive	No pull request description was provided by the author, making it impossible to assess whether it is related to the changeset.	Add a description explaining the purpose, scope, and any relevant details about the speaker assignment support implementation.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Initial speaker assignment support' clearly summarizes the main change—adding speaker assignment functionality throughout the codebase.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch speaker-assignment-suppoer

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…ppoer

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (3)

plugins/listener/src/actors/source.rs (2)

233-358: Significant code duplication between use_mixed branches.

The use_mixed = true block (lines 236-294) and use_mixed = false block (lines 302-356) contain nearly identical logic for creating mic/speaker streams and processing them in a tokio::select! loop. The only differences are platform-specific guards and minor variable handling.

Consider extracting the common stream processing logic into a helper function to reduce duplication and improve maintainability:
async fn process_audio_streams(
    mic_device: Option<String>,
    token: CancellationToken,
    stream_cancel_token: CancellationToken,
    mic_muted: Arc<AtomicBool>,
    myself: ActorRef<SourceMsg>,
) {
    let mic_stream = {
        let mut mic_input = AudioInput::from_mic(mic_device).unwrap();
        ResampledAsyncSource::new(mic_input.stream(), SAMPLE_RATE).chunks(AEC_BLOCK_SIZE)
    };
    tokio::time::sleep(tokio::time::Duration::from_millis(50)).await;
    let spk_stream = {
        let mut spk_input = AudioInput::from_speaker();
        ResampledAsyncSource::new(spk_input.stream(), SAMPLE_RATE).chunks(AEC_BLOCK_SIZE)
    };
    
    // ... common select! loop logic
}
Then both branches can call this helper with appropriate cfg guards.

273-279: Consider caching zero buffer for muted mic data.

The muting logic allocates a new Vec<f32> filled with zeros for each muted chunk. While correct, this could be optimized by caching a zero buffer (similar to silence_cache in processor.rs) or using a pre-allocated static buffer.

Example optimization:
// At SourceState level
zero_buffer: Arc<[f32]>, // pre-allocated zeros matching AEC_BLOCK_SIZE

// In processing:
let output_data = if mic_muted.load(Ordering::Relaxed) {
    zero_buffer.clone()
} else {
    Arc::from(data)
};
This would eliminate repeated allocations during muted periods.

apps/desktop/src/utils/speaker-hints.ts (1)

37-56: Consider extracting duplicate JSON parsing logic.

The user_speaker_assignment handling is implemented correctly with proper validation. However, the JSON parsing logic (lines 38-46) is duplicated with the parseProviderSpeakerIndex implementation (lines 68-76).

Consider extracting a shared helper:

+const parseJsonValue = (raw: unknown): unknown | undefined => {
+  if (raw == null) {
+    return undefined;
+  }
+
+  return typeof raw === "string"
+    ? (() => {
+      try {
+        return JSON.parse(raw);
+      } catch {
+        return undefined;
+      }
+    })()
+    : raw;
+};
+
 export function convertStorageHintsToRuntime(
   storageHints: SpeakerHintStorage[],
   wordIdToIndex: Map<string, number>,
 ): RuntimeSpeakerHint[] {
   const hints: RuntimeSpeakerHint[] = [];

   storageHints.forEach((hint) => {
     // ... existing validation ...

     if (hint.type === "provider_speaker_index") {
       const parsed = parseProviderSpeakerIndex(hint.value);
       // ...
     } else if (hint.type === "user_speaker_assignment") {
-      const data = typeof hint.value === "string"
-        ? (() => {
-          try {
-            return JSON.parse(hint.value);
-          } catch {
-            return undefined;
-          }
-        })()
-        : hint.value;
+      const data = parseJsonValue(hint.value);

       if (data && typeof data === "object" && "human_id" in data && typeof data.human_id === "string") {
         // ...
       }
     }
   });
 }

 const parseProviderSpeakerIndex = (raw: unknown): ProviderSpeakerIndexHint | undefined => {
-  if (raw == null) {
-    return undefined;
-  }
-
-  const data = typeof raw === "string"
-    ? (() => {
-      try {
-        return JSON.parse(raw);
-      } catch {
-        return undefined;
-      }
-    })()
-    : raw;
+  const data = parseJsonValue(raw);
   return providerSpeakerIndexSchema.safeParse(data).data;
 };

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 19be004 and 7f06793.

📒 Files selected for processing (16)

apps/desktop/src/components/main/body/sessions/note-input/transcript/editor.tsx (2 hunks)
apps/desktop/src/components/main/body/sessions/note-input/transcript/index.tsx (1 hunks)
apps/desktop/src/components/main/body/sessions/note-input/transcript/shared/index.tsx (6 hunks)
apps/desktop/src/components/main/body/sessions/note-input/transcript/shared/operations.tsx (1 hunks)
apps/desktop/src/components/main/body/sessions/note-input/transcript/shared/segment-header.tsx (4 hunks)
apps/desktop/src/devtool/seed/data/curated.json (1 hunks)
apps/desktop/src/utils/segment.test.ts (1 hunks)
apps/desktop/src/utils/segment.ts (3 hunks)
apps/desktop/src/utils/speaker-hints.ts (1 hunks)
owhisper/owhisper-client/src/lib.rs (1 hunks)
owhisper/owhisper-interface/src/stream.rs (1 hunks)
plugins/listener/src/actors/listener.rs (6 hunks)
plugins/listener/src/actors/mod.rs (1 hunks)
plugins/listener/src/actors/processor.rs (7 hunks)
plugins/listener/src/actors/session.rs (6 hunks)
plugins/listener/src/actors/source.rs (7 hunks)

🧰 Additional context used

🧬 Code graph analysis (12)

apps/desktop/src/components/main/body/sessions/note-input/transcript/index.tsx (1)

apps/desktop/src/contexts/listener.tsx (1)

useListener (33-47)

plugins/listener/src/actors/session.rs (2)

plugins/listener/src/actors/listener.rs (1)

name (51-53)

plugins/listener/src/actors/source.rs (1)

name (48-50)

owhisper/owhisper-interface/src/stream.rs (1)

plugins/listener/js/bindings.gen.ts (2)

StreamResponse (98-98)

Extra (93-93)

apps/desktop/src/components/main/body/sessions/note-input/transcript/editor.tsx (1)

apps/desktop/src/components/main/body/sessions/note-input/transcript/shared/index.tsx (1)

TranscriptContainer (25-104)

apps/desktop/src/components/main/body/sessions/note-input/transcript/shared/index.tsx (2)

apps/desktop/src/components/main/body/sessions/note-input/transcript/shared/operations.tsx (1)

Operations (1-4)

apps/desktop/src/components/main/body/sessions/note-input/transcript/shared/segment-header.tsx (1)

SegmentHeader (18-106)

apps/desktop/src/utils/speaker-hints.ts (1)

packages/db/src/schema.ts (2)

ProviderSpeakerIndexHint (287-287)

providerSpeakerIndexSchema (281-285)

apps/desktop/src/components/main/body/sessions/note-input/transcript/shared/segment-header.tsx (3)

apps/desktop/src/utils/segment.ts (1)

Segment (31-34)

apps/desktop/src/components/main/body/sessions/note-input/transcript/shared/operations.tsx (1)

Operations (1-4)

packages/db/src/schema.ts (1)

humans (61-74)

apps/desktop/src/utils/segment.test.ts (1)

apps/desktop/src/utils/segment.ts (2)

SegmentKey (36-40)

SegmentKey (42-46)

plugins/listener/src/actors/processor.rs (1)

crates/audio-utils/src/lib.rs (1)

f32_to_i16_bytes (51-61)

plugins/listener/src/actors/listener.rs (2)

crates/ws/src/client.rs (1)

finalize_with_text (23-27)

owhisper/owhisper-interface/src/stream.rs (4)

default (60-67)

default (82-93)

apply_offset (141-160)

set_extra (162-166)

apps/desktop/src/utils/segment.ts (2)

plugins/db/js/bindings.gen.ts (1)

SpeakerIdentity (200-200)

packages/db/src/schema.ts (2)

words (130-141)

speakerHints (144-154)

plugins/listener/src/actors/source.rs (3)

crates/audio/src/lib.rs (1)

is_using_headphone (204-217)

plugins/listener/src/actors/listener.rs (1)

name (51-53)

plugins/listener/src/actors/processor.rs (1)

name (53-55)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: ci (macos, macos-14)

🔇 Additional comments (20)

plugins/listener/src/actors/source.rs (2)

8-8: LGTM! Clean addition of ChannelMode state and query capability.

The GetMode RPC pattern matches the existing GetMicMute/GetMicDevice patterns, and initialization to Dual is appropriate.

Also applies to: 23-23, 42-42, 122-122

161-165: LGTM! GetMode handler follows established patterns.

The implementation correctly returns the current mode via RPC reply, consistent with other getter handlers.

plugins/listener/src/actors/processor.rs (5)

11-11: LGTM! Clean mode-driven architecture additions.

The SetMode/Reset message variants and mode state field establish a clear control flow for switching between Single and Dual processing modes.

Also applies to: 20-21, 36-37, 76-76

39-47: LGTM! Comprehensive pipeline reset.

The reset_pipeline method properly clears all stateful components including joiner queues, cached audio, AGC instances, and timing state.

99-107: LGTM! Proper mode change handling with conditional reset.

The SetMode handler correctly checks whether the mode has actually changed before resetting the pipeline, avoiding unnecessary disruption.

131-147: LGTM! Mode-dependent audio routing is correctly implemented.

In Single mode, spk_bytes contains the mixed (mic + spk) audio, which is appropriate for single-channel transcription. The clamping prevents overflow/clipping.

Note: RecorderActor (lines 118-123) always receives mixed audio regardless of mode, which appears intentional.

195-205: LGTM! Efficient silence caching and queue overflow protection.

The get_silence method properly caches zero buffers to avoid repeated allocations, and queue size limits with overflow warnings prevent unbounded memory growth.

Note: reset() correctly preserves the silence_cache since it's just an optimization cache.

Also applies to: 209-220

apps/desktop/src/components/main/body/sessions/note-input/transcript/editor.tsx (2)

4-4: LGTM!

The id utility import is correctly added to support hint ID generation.

58-61: LGTM!

The operations object correctly wires both onDeleteWord and onAssignSpeaker handlers to the TranscriptContainer.

apps/desktop/src/components/main/body/sessions/note-input/transcript/shared/index.tsx (4)

21-21: LGTM!

Correctly imports the centralized Operations type to replace the local WordOperations definition.

30-30: LGTM!

The refactoring to use the centralized Operations type is consistent throughout the component hierarchy (TranscriptContainer → RenderTranscript → SegmentRenderer → WordSpan).

Also applies to: 133-133, 182-182, 243-243

115-115: LGTM!

The decorative separator text change improves the visual distinction between transcript segments.

197-197: LGTM!

Correctly passes the operations prop to SegmentHeader, enabling the context menu functionality for speaker assignment.

apps/desktop/src/components/main/body/sessions/note-input/transcript/shared/segment-header.tsx (6)

4-16: LGTM!

Correctly imports ContextMenu components, the Operations type, and necessary utilities to support the speaker assignment UI.

18-18: LGTM!

The function signature correctly extends SegmentHeader to accept an optional operations prop, maintaining backward compatibility while enabling editor functionality.

44-59: LGTM!

Good implementation of editor mode detection and speaker assignment handler:

Properly derives mode based on operations presence

Safely filters words with IDs before non-null assertion

Correctly memoizes the assignment handler with appropriate dependencies

78-103: LGTM!

The context menu implementation demonstrates good UX practices:

Only enabled in editor mode when there are assignable words

Provides clear hierarchical menu structure (Assign Speaker → human list)

Gracefully handles empty humans list with a disabled item

Includes defensive fallback from human?.name to humanId

108-125: LGTM!

The useSegmentColor implementation correctly generates distinct colors per speaker using chroma-js with channel-based palettes and proper memoization.

127-148: LGTM!

The useSpeakerLabel hook demonstrates excellent defensive coding:

Prioritizes human names from the store when available

Falls back to channel-based labels using ChannelProfile enum

Provides sensible defaults for all cases (speaker index or channel letter)

apps/desktop/src/utils/speaker-hints.ts (1)

63-79: No breaking changes detected.

The verification confirms that parseProviderSpeakerIndex is only used internally within apps/desktop/src/utils/speaker-hints.ts (line 25), with no imports from other files in the codebase. The function is not exported, so the change is safe.

apps/desktop/src/components/main/body/sessions/note-input/transcript/editor.tsx

apps/desktop/src/utils/segment.ts

coderabbitai · 2025-11-04T09:17:52Z

plugins/listener/src/actors/listener.rs

+            let outbound = tokio_stream::StreamExt::map(
+                tokio_stream::wrappers::ReceiverStream::new(rx),
+                |msg| match msg {
+                    MixedMessage::Audio((_mic, spk)) => MixedMessage::Audio(spk),
+                    MixedMessage::Control(c) => MixedMessage::Control(c),
+                },
+            );


⚠️ Potential issue | 🔴 Critical

Single-channel stream is forwarding the wrong audio track

In the single-mode branch we drop the mic buffers and forward the speaker channel instead. In Single mode the upstream Source only guarantees the first tuple element (mic); the second element is often empty. With the current mapping every single-channel session will stream silence (or the wrong track) to Deepgram, breaking transcription entirely. Please forward the mic data here.

- MixedMessage::Audio((_mic, spk)) => MixedMessage::Audio(spk), + MixedMessage::Audio((mic, _spk)) => MixedMessage::Audio(mic),

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let outbound = tokio_stream::StreamExt::map(

tokio_stream::wrappers::ReceiverStream::new(rx),

|msg| match msg {

MixedMessage::Audio((_mic, spk)) => MixedMessage::Audio(spk),

MixedMessage::Control(c) => MixedMessage::Control(c),

},

);

let outbound = tokio_stream::StreamExt::map(

tokio_stream::wrappers::ReceiverStream::new(rx),

|msg| match msg {

MixedMessage::Audio((mic, _spk)) => MixedMessage::Audio(mic),

MixedMessage::Control(c) => MixedMessage::Control(c),

},

);

🤖 Prompt for AI Agents

In plugins/listener/src/actors/listener.rs around lines 208 to 214, the single-channel mapping currently drops the mic buffer and forwards the speaker buffer, which is wrong because in Single mode the upstream Source only guarantees the mic (first tuple element); change the mapping so that for MixedMessage::Audio((mic, _)) you forward MixedMessage::Audio(mic) instead of the speaker, and keep MixedMessage::Control unchanged; this ensures single-channel sessions send the mic track to Deepgram.

plugins/listener/src/actors/processor.rs

plugins/listener/src/actors/source.rs

yujonglee added 2 commits November 4, 2025 17:34

remap channel_index from [0,1] to [0, 2]

518aa83

bunch of fixes including the new segmenting logic

1c0d980

Merge remote-tracking branch 'origin/main' into speaker-assignment-su…

7f06793

…ppoer

coderabbitai bot reviewed Nov 4, 2025

View reviewed changes

yujonglee merged commit 20a3756 into main Nov 4, 2025
17 checks passed

yujonglee deleted the speaker-assignment-suppoer branch November 4, 2025 12:20

This was referenced Nov 13, 2025

feat: Linux Support #1659

Open

Fix health-checks in the setting #1663

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Initial speaker assignment support #1624

Initial speaker assignment support #1624

Uh oh!

yujonglee commented Nov 4, 2025

Uh oh!

coderabbitai bot commented Nov 4, 2025 •

edited

Loading

Walkthrough

Changes

Sequence Diagrams

Estimated code review effort

Possibly related PRs

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Nov 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Initial speaker assignment support #1624

Initial speaker assignment support #1624

Uh oh!

Conversation

yujonglee commented Nov 4, 2025

Uh oh!

coderabbitai bot commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagrams

Estimated code review effort

Possibly related PRs

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Nov 4, 2025 •

edited

Loading