Skip to content

Conversation

@yujonglee
Copy link
Contributor

No description provided.

@coderabbitai
Copy link

coderabbitai bot commented Nov 4, 2025

📝 Walkthrough

Walkthrough

This pull request adds speaker assignment functionality to the transcript editor, refactors shared operations type definitions, implements multi-mode audio channel support in the listener actor pipeline, and extends stream response handling with offset/metadata methods. Changes span desktop transcript UI components, utility functions for segment building and speaker hints, and Rust audio processing actors.

Changes

Cohort / File(s) Summary
Transcript Speaker Assignment
apps/desktop/src/components/main/body/sessions/note-input/transcript/editor.tsx
Introduces handleAssignSpeaker callback to create speaker hint entries with user-assigned speaker data, validates store state, and registers new checkpoint.
Transcript Component Initialization
apps/desktop/src/components/main/body/sessions/note-input/transcript/index.tsx
Adds inactive state detection and conditionally renders EditingControls based on session status.
Shared Operations Type Refactor
apps/desktop/src/components/main/body/sessions/note-input/transcript/shared/index.tsx
Replaces local WordOperations type with centralized Operations type, updates component signatures, and changes TranscriptSeparator text decoration.
Operations Type Definition
apps/desktop/src/components/main/body/sessions/note-input/transcript/shared/operations.tsx
Defines new Operations type with onDeleteWord and onAssignSpeaker callbacks.
Segment Header Speaker Assignment UI
apps/desktop/src/components/main/body/sessions/note-input/transcript/shared/segment-header.tsx
Adds operations prop, renders ContextMenu with Assign Speaker submenu when in editor mode, refactors color/label hooks, and enables speaker selection.
Segment Building Logic
apps/desktop/src/utils/segment.ts
Introduces ChannelProfile enum, refactors speaker tracking with SpeakerState, adds SegmentWord type, reworks segment construction with stateful identity resolution and normalization pipeline.
Segment Tests
apps/desktop/src/utils/segment.test.ts
Adds comprehensive test cases for speaker hint propagation, human_id inference, segment splitting on speaker changes, and multi-channel segmentation.
Speaker Hints Runtime Conversion
apps/desktop/src/utils/speaker-hints.ts
Adds user_speaker_assignment hint type handling in convertStorageHintsToRuntime, changes parseProviderSpeakerIndex visibility to private.
Seed Data
apps/desktop/src/devtool/seed/data/curated.json
Adds two new curated entries with transcript segments.
Audio Stream Response Methods
owhisper/owhisper-interface/src/stream.rs
Adds apply_offset, set_extra, and remap_channel_index methods to StreamResponse.
oWhisper Client Export
owhisper/owhisper-client/src/lib.rs
Re-exports hypr_ws crate.
Multi-Mode Listener Actors
plugins/listener/src/actors/listener.rs
Adds ChannelMode support with ChangeMode message, expands ListenerArgs with mode and session timing, applies offset/extra to responses, and branches RX spawn logic on mode.
Channel Mode Enum
plugins/listener/src/actors/mod.rs
Defines new public ChannelMode enum with Single and Dual variants.
Mode-Aware Audio Processing
plugins/listener/src/actors/processor.rs
Replaces Mixed variant with SetMode and Reset messages, adds mode field to ProcState, implements mode-conditional audio mixing in process_ready, and introduces silence caching in Joiner.
Session Timing & Mode Discovery
plugins/listener/src/actors/session.rs
Adds timing fields (started_at_instant, started_at_system), queries SourceActor for mode, and passes mode and timestamps to ListenerArgs.
Source Mode Computation & Propagation
plugins/listener/src/actors/source.rs
Adds GetMode message, mode field to SourceState, computes platform-specific mode logic, and propagates mode changes to ProcessorActor and ListenerActor.

Sequence Diagrams

sequenceDiagram
    participant User as User UI
    participant Editor as TranscriptEditor
    participant Header as SegmentHeader
    participant Menu as ContextMenu
    participant Store as Store
    participant Callback as onAssignSpeaker

    User->>Header: Hover segment (editor mode)
    Header->>Menu: Render Assign Speaker menu
    Menu->>Store: Fetch available humans
    User->>Menu: Select speaker
    Menu->>Header: handleAssignSpeaker(wordIds, humanId)
    Header->>Callback: Call onAssignSpeaker
    Callback->>Store: Create speaker_hints entries
    Callback->>Store: Register checkpoint
Loading
sequenceDiagram
    participant Source as SourceActor
    participant Listener as ListenerActor
    participant Processor as ProcessorActor
    participant Joiner as Joiner
    participant Stream as StreamResponse

    Source->>Source: Compute new_mode (platform-specific)
    Source->>Processor: ProcMsg::SetMode(new_mode)
    Source->>Listener: ListenerMsg::ChangeMode(new_mode)
    Processor->>Processor: reset_pipeline()
    Listener->>Listener: Spawn new RX task with mode
    Listener->>Joiner: pop_pair(mode)
    alt mode == Single
        Joiner->>Joiner: Mix mic + spk
    else mode == Dual
        Joiner->>Joiner: Return separate mic/spk
    end
    Listener->>Stream: apply_offset()
    Listener->>Stream: set_extra(started_unix_secs)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Segment building refactor (segment.ts): Significant state machine rewrite with speaker identity resolution, normalization pipeline, and complex segmentation logic requires careful verification of correctness across edge cases.
  • Multi-mode audio pipeline: Substantial refactor across five actor files (listener.rs, processor.rs, source.rs, session.rs, mod.rs) introducing mode-conditional branching, channel remapping, and state management changes.
  • Speaker assignment feature: New callbacks and UI integration across transcript components with store interactions and checkpoint registration.
  • Type system changes: ChannelProfile enum replacing numeric channels affects multiple function signatures and requires tracing impact throughout the codebase.

Possibly related PRs

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Description check ❓ Inconclusive No pull request description was provided by the author, making it impossible to assess whether it is related to the changeset. Add a description explaining the purpose, scope, and any relevant details about the speaker assignment support implementation.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Initial speaker assignment support' clearly summarizes the main change—adding speaker assignment functionality throughout the codebase.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch speaker-assignment-suppoer

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (3)
plugins/listener/src/actors/source.rs (2)

233-358: Significant code duplication between use_mixed branches.

The use_mixed = true block (lines 236-294) and use_mixed = false block (lines 302-356) contain nearly identical logic for creating mic/speaker streams and processing them in a tokio::select! loop. The only differences are platform-specific guards and minor variable handling.

Consider extracting the common stream processing logic into a helper function to reduce duplication and improve maintainability:

async fn process_audio_streams(
    mic_device: Option<String>,
    token: CancellationToken,
    stream_cancel_token: CancellationToken,
    mic_muted: Arc<AtomicBool>,
    myself: ActorRef<SourceMsg>,
) {
    let mic_stream = {
        let mut mic_input = AudioInput::from_mic(mic_device).unwrap();
        ResampledAsyncSource::new(mic_input.stream(), SAMPLE_RATE).chunks(AEC_BLOCK_SIZE)
    };
    tokio::time::sleep(tokio::time::Duration::from_millis(50)).await;
    let spk_stream = {
        let mut spk_input = AudioInput::from_speaker();
        ResampledAsyncSource::new(spk_input.stream(), SAMPLE_RATE).chunks(AEC_BLOCK_SIZE)
    };
    
    // ... common select! loop logic
}

Then both branches can call this helper with appropriate cfg guards.


273-279: Consider caching zero buffer for muted mic data.

The muting logic allocates a new Vec<f32> filled with zeros for each muted chunk. While correct, this could be optimized by caching a zero buffer (similar to silence_cache in processor.rs) or using a pre-allocated static buffer.

Example optimization:

// At SourceState level
zero_buffer: Arc<[f32]>, // pre-allocated zeros matching AEC_BLOCK_SIZE

// In processing:
let output_data = if mic_muted.load(Ordering::Relaxed) {
    zero_buffer.clone()
} else {
    Arc::from(data)
};

This would eliminate repeated allocations during muted periods.

apps/desktop/src/utils/speaker-hints.ts (1)

37-56: Consider extracting duplicate JSON parsing logic.

The user_speaker_assignment handling is implemented correctly with proper validation. However, the JSON parsing logic (lines 38-46) is duplicated with the parseProviderSpeakerIndex implementation (lines 68-76).

Consider extracting a shared helper:

+const parseJsonValue = (raw: unknown): unknown | undefined => {
+  if (raw == null) {
+    return undefined;
+  }
+
+  return typeof raw === "string"
+    ? (() => {
+      try {
+        return JSON.parse(raw);
+      } catch {
+        return undefined;
+      }
+    })()
+    : raw;
+};
+
 export function convertStorageHintsToRuntime(
   storageHints: SpeakerHintStorage[],
   wordIdToIndex: Map<string, number>,
 ): RuntimeSpeakerHint[] {
   const hints: RuntimeSpeakerHint[] = [];

   storageHints.forEach((hint) => {
     // ... existing validation ...

     if (hint.type === "provider_speaker_index") {
       const parsed = parseProviderSpeakerIndex(hint.value);
       // ...
     } else if (hint.type === "user_speaker_assignment") {
-      const data = typeof hint.value === "string"
-        ? (() => {
-          try {
-            return JSON.parse(hint.value);
-          } catch {
-            return undefined;
-          }
-        })()
-        : hint.value;
+      const data = parseJsonValue(hint.value);

       if (data && typeof data === "object" && "human_id" in data && typeof data.human_id === "string") {
         // ...
       }
     }
   });
 }

 const parseProviderSpeakerIndex = (raw: unknown): ProviderSpeakerIndexHint | undefined => {
-  if (raw == null) {
-    return undefined;
-  }
-
-  const data = typeof raw === "string"
-    ? (() => {
-      try {
-        return JSON.parse(raw);
-      } catch {
-        return undefined;
-      }
-    })()
-    : raw;
+  const data = parseJsonValue(raw);
   return providerSpeakerIndexSchema.safeParse(data).data;
 };
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 19be004 and 7f06793.

📒 Files selected for processing (16)
  • apps/desktop/src/components/main/body/sessions/note-input/transcript/editor.tsx (2 hunks)
  • apps/desktop/src/components/main/body/sessions/note-input/transcript/index.tsx (1 hunks)
  • apps/desktop/src/components/main/body/sessions/note-input/transcript/shared/index.tsx (6 hunks)
  • apps/desktop/src/components/main/body/sessions/note-input/transcript/shared/operations.tsx (1 hunks)
  • apps/desktop/src/components/main/body/sessions/note-input/transcript/shared/segment-header.tsx (4 hunks)
  • apps/desktop/src/devtool/seed/data/curated.json (1 hunks)
  • apps/desktop/src/utils/segment.test.ts (1 hunks)
  • apps/desktop/src/utils/segment.ts (3 hunks)
  • apps/desktop/src/utils/speaker-hints.ts (1 hunks)
  • owhisper/owhisper-client/src/lib.rs (1 hunks)
  • owhisper/owhisper-interface/src/stream.rs (1 hunks)
  • plugins/listener/src/actors/listener.rs (6 hunks)
  • plugins/listener/src/actors/mod.rs (1 hunks)
  • plugins/listener/src/actors/processor.rs (7 hunks)
  • plugins/listener/src/actors/session.rs (6 hunks)
  • plugins/listener/src/actors/source.rs (7 hunks)
🧰 Additional context used
🧬 Code graph analysis (12)
apps/desktop/src/components/main/body/sessions/note-input/transcript/index.tsx (1)
apps/desktop/src/contexts/listener.tsx (1)
  • useListener (33-47)
plugins/listener/src/actors/session.rs (2)
plugins/listener/src/actors/listener.rs (1)
  • name (51-53)
plugins/listener/src/actors/source.rs (1)
  • name (48-50)
owhisper/owhisper-interface/src/stream.rs (1)
plugins/listener/js/bindings.gen.ts (2)
  • StreamResponse (98-98)
  • Extra (93-93)
apps/desktop/src/components/main/body/sessions/note-input/transcript/editor.tsx (1)
apps/desktop/src/components/main/body/sessions/note-input/transcript/shared/index.tsx (1)
  • TranscriptContainer (25-104)
apps/desktop/src/components/main/body/sessions/note-input/transcript/shared/index.tsx (2)
apps/desktop/src/components/main/body/sessions/note-input/transcript/shared/operations.tsx (1)
  • Operations (1-4)
apps/desktop/src/components/main/body/sessions/note-input/transcript/shared/segment-header.tsx (1)
  • SegmentHeader (18-106)
apps/desktop/src/utils/speaker-hints.ts (1)
packages/db/src/schema.ts (2)
  • ProviderSpeakerIndexHint (287-287)
  • providerSpeakerIndexSchema (281-285)
apps/desktop/src/components/main/body/sessions/note-input/transcript/shared/segment-header.tsx (3)
apps/desktop/src/utils/segment.ts (1)
  • Segment (31-34)
apps/desktop/src/components/main/body/sessions/note-input/transcript/shared/operations.tsx (1)
  • Operations (1-4)
packages/db/src/schema.ts (1)
  • humans (61-74)
apps/desktop/src/utils/segment.test.ts (1)
apps/desktop/src/utils/segment.ts (2)
  • SegmentKey (36-40)
  • SegmentKey (42-46)
plugins/listener/src/actors/processor.rs (1)
crates/audio-utils/src/lib.rs (1)
  • f32_to_i16_bytes (51-61)
plugins/listener/src/actors/listener.rs (2)
crates/ws/src/client.rs (1)
  • finalize_with_text (23-27)
owhisper/owhisper-interface/src/stream.rs (4)
  • default (60-67)
  • default (82-93)
  • apply_offset (141-160)
  • set_extra (162-166)
apps/desktop/src/utils/segment.ts (2)
plugins/db/js/bindings.gen.ts (1)
  • SpeakerIdentity (200-200)
packages/db/src/schema.ts (2)
  • words (130-141)
  • speakerHints (144-154)
plugins/listener/src/actors/source.rs (3)
crates/audio/src/lib.rs (1)
  • is_using_headphone (204-217)
plugins/listener/src/actors/listener.rs (1)
  • name (51-53)
plugins/listener/src/actors/processor.rs (1)
  • name (53-55)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: ci (macos, macos-14)
🔇 Additional comments (20)
plugins/listener/src/actors/source.rs (2)

8-8: LGTM! Clean addition of ChannelMode state and query capability.

The GetMode RPC pattern matches the existing GetMicMute/GetMicDevice patterns, and initialization to Dual is appropriate.

Also applies to: 23-23, 42-42, 122-122


161-165: LGTM! GetMode handler follows established patterns.

The implementation correctly returns the current mode via RPC reply, consistent with other getter handlers.

plugins/listener/src/actors/processor.rs (5)

11-11: LGTM! Clean mode-driven architecture additions.

The SetMode/Reset message variants and mode state field establish a clear control flow for switching between Single and Dual processing modes.

Also applies to: 20-21, 36-37, 76-76


39-47: LGTM! Comprehensive pipeline reset.

The reset_pipeline method properly clears all stateful components including joiner queues, cached audio, AGC instances, and timing state.


99-107: LGTM! Proper mode change handling with conditional reset.

The SetMode handler correctly checks whether the mode has actually changed before resetting the pipeline, avoiding unnecessary disruption.


131-147: LGTM! Mode-dependent audio routing is correctly implemented.

In Single mode, spk_bytes contains the mixed (mic + spk) audio, which is appropriate for single-channel transcription. The clamping prevents overflow/clipping.

Note: RecorderActor (lines 118-123) always receives mixed audio regardless of mode, which appears intentional.


195-205: LGTM! Efficient silence caching and queue overflow protection.

The get_silence method properly caches zero buffers to avoid repeated allocations, and queue size limits with overflow warnings prevent unbounded memory growth.

Note: reset() correctly preserves the silence_cache since it's just an optimization cache.

Also applies to: 209-220

apps/desktop/src/components/main/body/sessions/note-input/transcript/editor.tsx (2)

4-4: LGTM!

The id utility import is correctly added to support hint ID generation.


58-61: LGTM!

The operations object correctly wires both onDeleteWord and onAssignSpeaker handlers to the TranscriptContainer.

apps/desktop/src/components/main/body/sessions/note-input/transcript/shared/index.tsx (4)

21-21: LGTM!

Correctly imports the centralized Operations type to replace the local WordOperations definition.


30-30: LGTM!

The refactoring to use the centralized Operations type is consistent throughout the component hierarchy (TranscriptContainer → RenderTranscript → SegmentRenderer → WordSpan).

Also applies to: 133-133, 182-182, 243-243


115-115: LGTM!

The decorative separator text change improves the visual distinction between transcript segments.


197-197: LGTM!

Correctly passes the operations prop to SegmentHeader, enabling the context menu functionality for speaker assignment.

apps/desktop/src/components/main/body/sessions/note-input/transcript/shared/segment-header.tsx (6)

4-16: LGTM!

Correctly imports ContextMenu components, the Operations type, and necessary utilities to support the speaker assignment UI.


18-18: LGTM!

The function signature correctly extends SegmentHeader to accept an optional operations prop, maintaining backward compatibility while enabling editor functionality.


44-59: LGTM!

Good implementation of editor mode detection and speaker assignment handler:

  • Properly derives mode based on operations presence
  • Safely filters words with IDs before non-null assertion
  • Correctly memoizes the assignment handler with appropriate dependencies

78-103: LGTM!

The context menu implementation demonstrates good UX practices:

  • Only enabled in editor mode when there are assignable words
  • Provides clear hierarchical menu structure (Assign Speaker → human list)
  • Gracefully handles empty humans list with a disabled item
  • Includes defensive fallback from human?.name to humanId

108-125: LGTM!

The useSegmentColor implementation correctly generates distinct colors per speaker using chroma-js with channel-based palettes and proper memoization.


127-148: LGTM!

The useSpeakerLabel hook demonstrates excellent defensive coding:

  • Prioritizes human names from the store when available
  • Falls back to channel-based labels using ChannelProfile enum
  • Provides sensible defaults for all cases (speaker index or channel letter)
apps/desktop/src/utils/speaker-hints.ts (1)

63-79: No breaking changes detected.

The verification confirms that parseProviderSpeakerIndex is only used internally within apps/desktop/src/utils/speaker-hints.ts (line 25), with no imports from other files in the codebase. The function is not exported, so the change is safe.

Comment on lines +208 to +214
let outbound = tokio_stream::StreamExt::map(
tokio_stream::wrappers::ReceiverStream::new(rx),
|msg| match msg {
MixedMessage::Audio((_mic, spk)) => MixedMessage::Audio(spk),
MixedMessage::Control(c) => MixedMessage::Control(c),
},
);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Single-channel stream is forwarding the wrong audio track

In the single-mode branch we drop the mic buffers and forward the speaker channel instead. In Single mode the upstream Source only guarantees the first tuple element (mic); the second element is often empty. With the current mapping every single-channel session will stream silence (or the wrong track) to Deepgram, breaking transcription entirely. Please forward the mic data here.

-                    MixedMessage::Audio((_mic, spk)) => MixedMessage::Audio(spk),
+                    MixedMessage::Audio((mic, _spk)) => MixedMessage::Audio(mic),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let outbound = tokio_stream::StreamExt::map(
tokio_stream::wrappers::ReceiverStream::new(rx),
|msg| match msg {
MixedMessage::Audio((_mic, spk)) => MixedMessage::Audio(spk),
MixedMessage::Control(c) => MixedMessage::Control(c),
},
);
let outbound = tokio_stream::StreamExt::map(
tokio_stream::wrappers::ReceiverStream::new(rx),
|msg| match msg {
MixedMessage::Audio((mic, _spk)) => MixedMessage::Audio(mic),
MixedMessage::Control(c) => MixedMessage::Control(c),
},
);
🤖 Prompt for AI Agents
In plugins/listener/src/actors/listener.rs around lines 208 to 214, the
single-channel mapping currently drops the mic buffer and forwards the speaker
buffer, which is wrong because in Single mode the upstream Source only
guarantees the mic (first tuple element); change the mapping so that for
MixedMessage::Audio((mic, _)) you forward MixedMessage::Audio(mic) instead of
the speaker, and keep MixedMessage::Control unchanged; this ensures
single-channel sessions send the mic track to Deepgram.

@yujonglee yujonglee merged commit 20a3756 into main Nov 4, 2025
17 checks passed
@yujonglee yujonglee deleted the speaker-assignment-suppoer branch November 4, 2025 12:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants