Skip to content

Conversation

@yarns123
Copy link

@yarns123 yarns123 commented Jun 2, 2025

  • Introduced LogConnectionClosedWithInvalidPyload method in MultiModalLiveClientLoggingExtensions to log errors for WebSocket closures due to invalid payloads.
  • Updated MultiModalLiveClient to log close status descriptions for invalid payloads.
  • Added OutputAudioTranscription and InputAudioTranscription properties to BidiGenerateContentSetup, initialized with new instances.
  • Created a new class OutputAudioTranscription for handling audio transcription data.

Summary by CodeRabbit

  • New Features
    • Added options for handling input and output audio transcription in content generation setup.
  • Bug Fixes
    • Improved error logging for WebSocket connections closed due to invalid payloads, providing more detailed information.

- Introduced `LogConnectionClosedWithInvalidPyload` method in `MultiModalLiveClientLoggingExtensions` to log errors for WebSocket closures due to invalid payloads.
- Updated `MultiModalLiveClient` to log close status descriptions for invalid payloads.
- Added `OutputAudioTranscription` and `InputAudioTranscription` properties to `BidiGenerateContentSetup`, initialized with new instances.
- Created a new class `OutputAudioTranscription` for handling audio transcription data.
@coderabbitai
Copy link

coderabbitai bot commented Jun 2, 2025

Walkthrough

A new logging method for handling invalid WebSocket payloads was introduced, along with its integration into the WebSocket client's disconnection logic. Additionally, the BidiGenerateContentSetup class was extended with two nullable properties for audio transcription, and a new OutputAudioTranscription class was added.

Changes

File(s) Change Summary
src/GenerativeAI.Live/Logging/LoggingExtensions.cs Added LogConnectionClosedWithInvalidPyload logging method to handle WebSocket invalid payload disconnections.
src/GenerativeAI.Live/Models/MultiModalLiveClient.cs Integrated new logging method in WebSocket disconnect handler for InvalidPayloadData close status.
src/GenerativeAI/Types/MultimodalLive/BidiGenerateContentSetup.cs Added OutputAudioTranscription and InputAudioTranscription properties; introduced OutputAudioTranscription class.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant MultiModalLiveClient
    participant Logger

    Client->>MultiModalLiveClient: ConnectAsync()
    MultiModalLiveClient-->>Client: Establish WebSocket connection
    Note over MultiModalLiveClient: WebSocket disconnects
    MultiModalLiveClient->>MultiModalLiveClient: Check close status
    alt close status is InvalidPayloadData
        MultiModalLiveClient->>Logger: LogConnectionClosedWithInvalidPyload(closeStatusDescription)
    else other close status
        MultiModalLiveClient->>Logger: Log generic connection closed
    end
    MultiModalLiveClient-->>Client: Disconnected event
Loading

Poem

A bunny logs with careful paws,
Invalid payloads get their cause.
New properties for sound and voice,
Transcription added—what a choice!
When WebSockets close, we know just why,
With logging neat, we hop right by.
🐇

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
src/GenerativeAI.Live/Models/MultiModalLiveClient.cs (1)

434-434: Remove redundant comment.

The comment //log info.CloseStatusDescription is redundant since the code below clearly shows what's being logged.

-                //log info.CloseStatusDescription
src/GenerativeAI/Types/MultimodalLive/BidiGenerateContentSetup.cs (1)

53-56: Implement the OutputAudioTranscription class.

The OutputAudioTranscription class is currently empty, which appears to be a placeholder. Please implement the required properties based on the API specification.

Do you want me to help generate the implementation based on the Gemini API documentation for audio transcription properties?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7f68457 and 58590a5.

📒 Files selected for processing (3)
  • src/GenerativeAI.Live/Logging/LoggingExtensions.cs (1 hunks)
  • src/GenerativeAI.Live/Models/MultiModalLiveClient.cs (1 hunks)
  • src/GenerativeAI/Types/MultimodalLive/BidiGenerateContentSetup.cs (1 hunks)
🔇 Additional comments (1)
src/GenerativeAI/Types/MultimodalLive/BidiGenerateContentSetup.cs (1)

48-48: Consider the default initialization inconsistency.

OutputAudioTranscription is initialized with a new instance by default, while InputAudioTranscription is not. This inconsistency might be intentional, but please verify if both should have the same initialization behavior.

Comment on lines +51 to +52
[LoggerMessage(EventId = 113, Level = LogLevel.Error, Message = "WebSocket connection closed caused by invalid payload: {CloseStatusDescription}")]
public static partial void LogConnectionClosedWithInvalidPyload(this ILogger logger, string closeStatusDescription);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix the typo in the method name.

The method name LogConnectionClosedWithInvalidPyload has a typo - it should be LogConnectionClosedWithInvalidPayload (missing 'a' in "Payload").

Apply this diff to fix the typo:

-    [LoggerMessage(EventId = 113, Level = LogLevel.Error, Message = "WebSocket connection closed caused by invalid payload: {CloseStatusDescription}")]
-    public static partial void LogConnectionClosedWithInvalidPyload(this ILogger logger, string closeStatusDescription);
+    [LoggerMessage(EventId = 113, Level = LogLevel.Error, Message = "WebSocket connection closed caused by invalid payload: {CloseStatusDescription}")]
+    public static partial void LogConnectionClosedWithInvalidPayload(this ILogger logger, string closeStatusDescription);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
[LoggerMessage(EventId = 113, Level = LogLevel.Error, Message = "WebSocket connection closed caused by invalid payload: {CloseStatusDescription}")]
public static partial void LogConnectionClosedWithInvalidPyload(this ILogger logger, string closeStatusDescription);
[LoggerMessage(EventId = 113, Level = LogLevel.Error, Message = "WebSocket connection closed caused by invalid payload: {CloseStatusDescription}")]
public static partial void LogConnectionClosedWithInvalidPayload(this ILogger logger, string closeStatusDescription);
🤖 Prompt for AI Agents
In src/GenerativeAI.Live/Logging/LoggingExtensions.cs at lines 51 to 52, the
method name LogConnectionClosedWithInvalidPyload contains a typo. Rename the
method to LogConnectionClosedWithInvalidPayload by adding the missing 'a' in
"Payload" to correct the spelling.

Comment on lines +432 to +436
else if (info.CloseStatus == WebSocketCloseStatus.InvalidPayloadData)
{
//log info.CloseStatusDescription
_logger?.LogConnectionClosedWithInvalidPyload(info.CloseStatusDescription!);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Fix the typo and verify null safety.

Two issues in this code segment:

  1. Same typo as in the logging extensions: LogConnectionClosedWithInvalidPyload should be LogConnectionClosedWithInvalidPayload
  2. The null-forgiving operator (!) on CloseStatusDescription could be risky if the value is actually null

Apply this diff to fix the typo:

-                _logger?.LogConnectionClosedWithInvalidPyload(info.CloseStatusDescription!);
+                _logger?.LogConnectionClosedWithInvalidPayload(info.CloseStatusDescription!);

Verify that CloseStatusDescription is never null when CloseStatus is InvalidPayloadData:


🏁 Script executed:

#!/bin/bash
# Description: Check if CloseStatusDescription can be null for InvalidPayloadData scenarios

# Search for WebSocketCloseStatus.InvalidPayloadData usage patterns
rg -A 5 -B 5 "InvalidPayloadData"

# Look for any null checks on CloseStatusDescription
rg -A 3 -B 3 "CloseStatusDescription"

Length of output: 2680


Fix typo, handle possible null, and update extension method

Two issues found in InvalidPayloadData branch:

  • Typo in the logging call and its extension: InvalidPyloadInvalidPayload
  • Unsafe null-forgiving operator on info.CloseStatusDescription

Please update both call site and extension definition, and guard against a null description. Example diff:

In src/GenerativeAI.Live/Models/MultiModalLiveClient.cs:

-            else if (info.CloseStatus == WebSocketCloseStatus.InvalidPayloadData)
-            {
-                //log info.CloseStatusDescription
-                _logger?.LogConnectionClosedWithInvalidPyload(info.CloseStatusDescription!);
-            }
+            else if (info.CloseStatus == WebSocketCloseStatus.InvalidPayloadData)
+            {
+                // ensure description is never null
+                var description = info.CloseStatusDescription ?? "No payload description";
+                _logger?.LogConnectionClosedWithInvalidPayload(description);
+            }

In src/GenerativeAI.Live/Logging/LoggingExtensions.cs:

-    [LoggerMessage(EventId = 113, Level = LogLevel.Error,
-                   Message = "WebSocket connection closed caused by invalid payload: {CloseStatusDescription}")]
-    public static partial void LogConnectionClosedWithInvalidPyload(this ILogger logger, string closeStatusDescription);
+    [LoggerMessage(EventId = 113, Level = LogLevel.Error,
+                   Message = "WebSocket connection closed caused by invalid payload: {CloseStatusDescription}")]
+    public static partial void LogConnectionClosedWithInvalidPayload(this ILogger logger, string closeStatusDescription);
  • Rename both call and extension to “InvalidPayload”
  • Replace ! with ?? or another null‐safe default
  • (Optional) Verify whether a Disconnected event should also be raised for this case per your connection lifecycle
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
else if (info.CloseStatus == WebSocketCloseStatus.InvalidPayloadData)
{
//log info.CloseStatusDescription
_logger?.LogConnectionClosedWithInvalidPyload(info.CloseStatusDescription!);
}
else if (info.CloseStatus == WebSocketCloseStatus.InvalidPayloadData)
{
// ensure description is never null
var description = info.CloseStatusDescription ?? "No payload description";
_logger?.LogConnectionClosedWithInvalidPayload(description);
}
🤖 Prompt for AI Agents
In src/GenerativeAI.Live/Models/MultiModalLiveClient.cs around lines 432 to 436,
fix the typo in the logging call from LogConnectionClosedWithInvalidPyload to
LogConnectionClosedWithInvalidPayload. Remove the null-forgiving operator on
info.CloseStatusDescription and instead provide a null-safe fallback value
(e.g., an empty string or a default message) to avoid potential null reference
exceptions. Also update the corresponding extension method in
src/GenerativeAI.Live/Logging/LoggingExtensions.cs to rename the method
accordingly and handle null inputs safely.

Comment on lines +47 to +51
[JsonPropertyName("outputAudioTranscription")]
public OutputAudioTranscription? OutputAudioTranscription { get; set; } = new OutputAudioTranscription();

[JsonPropertyName("inputAudioTranscription")]
public OutputAudioTranscription? InputAudioTranscription { get; set; }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Reconsider the type design for input and output transcription.

Both OutputAudioTranscription and InputAudioTranscription properties use the same OutputAudioTranscription type. This seems inconsistent and potentially confusing. Consider:

  1. Creating separate types for input and output transcription if they have different structures
  2. Using a more generic type name like AudioTranscription if they are truly identical
  3. Adding XML documentation to clarify the purpose and differences between these properties

Verify the intended design by checking the API documentation or related usage:


🌐 Web query:

What are the differences between inputAudioTranscription and outputAudioTranscription in the Gemini Multimodal Live API setup configuration?

💡 Result:

Differences Between inputAudioTranscription and outputAudioTranscription in the Gemini Multimodal Live API

The Gemini Multimodal Live API provides configuration options for real-time transcription of both incoming and outgoing audio. In the session setup, two distinct configuration fields manage these behaviors: inputAudioTranscription and outputAudioTranscription. Here are their key differences:

inputAudioTranscription

  • Purpose: Enables transcription of audio input sent from the client (usually the user's speech or audio data).
  • Function: When set, the API automatically processes and transcribes the incoming audio stream into text that aligns with the language of the input audio.
  • Use Case: Useful for capturing and understanding what the user says in real time; for example, to display live captions of user speech or for further processing by the model.
  • Language Alignment: The transcription is performed in the language detected or specified for the input audio[1][5].

outputAudioTranscription

  • Purpose: Enables transcription of audio output generated by the model (the assistant's spoken responses).
  • Function: When enabled, the API provides a text transcription of the audio that the model generates as its response, aligning with the output audio's language code.
  • Use Case: Useful for displaying the assistant's spoken responses as text, offering accessibility for users who need visual confirmation, or enabling further text-based processing.
  • Language Alignment: The transcription matches the language set for the model's audio output, which can be explicitly defined in the output configuration[1][5].

Summary Table

Feature inputAudioTranscription outputAudioTranscription
Transcribes Incoming (user) audio Outgoing (assistant/model) audio
Text aligns with Input audio language Output audio language code
Use case Capture/understand user speech in real time Display/understand model's spoken responses in text
Activated by Including input_audio_transcription in setup Including output_audio_transcription in setup

Typical Use in Configuration

In a typical session configuration, both can be enabled simultaneously for full conversational visibility:

config = {
    "response_modalities": ["AUDIO"],
    "input_audio_transcription": {},
    "output_audio_transcription": {}
}
  • This setup will provide both the user's spoken input and the assistant's spoken output as synchronized text streams, facilitating live transcription on both ends[5].

Practical Example

  • When a user speaks, inputAudioTranscription provides a real-time transcript of their speech.
  • When the model replies via synthesized speech, outputAudioTranscription delivers the text equivalent of that synthesized audio.

Summary

  • inputAudioTranscription: Transcribes and exposes the user's input speech.
  • outputAudioTranscription: Transcribes and exposes the model's spoken output, strictly following the language code specified for output.

This dual transcription capability ensures clear, accessible, and auditable interactions in voice or multimodal applications using the Gemini Multimodal Live API[1][5].

Citations:


Refactor audio‐transcription types and property initialization

Both input and output transcription share the same (currently empty) configuration shape, so the OutputAudioTranscription type name is misleading and the two properties shouldn’t be initialized inconsistently. Please:

  • Rename the placeholder class to a more generic name, e.g.:
    /// <summary>
    /// Configuration for transcribing audio streams.
    /// </summary>
    public class AudioTranscription { }
  • Update your properties in BidiGenerateContentSetup.cs (around lines 47–51) to use this new type and remove the default instantiation:
    - [JsonPropertyName("outputAudioTranscription")]
    - public OutputAudioTranscription? OutputAudioTranscription { get; set; } = new OutputAudioTranscription();
    + /// <summary>
    + /// Transcribes the assistant’s outgoing audio into text.
    + /// </summary>
    + [JsonPropertyName("outputAudioTranscription")]
    + public AudioTranscription? OutputAudioTranscription { get; set; }
    
    - [JsonPropertyName("inputAudioTranscription")]
    - public OutputAudioTranscription? InputAudioTranscription { get; set; }
    + /// <summary>
    + /// Transcribes the user’s incoming audio into text.
    + /// </summary>
    + [JsonPropertyName("inputAudioTranscription")]
    + public AudioTranscription? InputAudioTranscription { get; set; }
  • Add XML documentation on both the class and its properties to clarify their distinct roles.

These changes will make the intent clear, avoid confusion over type reuse, and enforce explicit configuration for each transcription stream.

🤖 Prompt for AI Agents
In src/GenerativeAI/Types/MultimodalLive/BidiGenerateContentSetup.cs around
lines 47 to 51, rename the OutputAudioTranscription class to a more generic name
like AudioTranscription to reflect its shared configuration role. Update both
InputAudioTranscription and OutputAudioTranscription properties to use this new
AudioTranscription type and remove the default initialization on
OutputAudioTranscription to keep consistency. Add XML documentation comments on
the AudioTranscription class and on both properties to clearly describe their
distinct purposes for input and output audio transcription configurations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants