-
Notifications
You must be signed in to change notification settings - Fork 35
Add logging for invalid WebSocket payloads and new properties #49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Introduced `LogConnectionClosedWithInvalidPyload` method in `MultiModalLiveClientLoggingExtensions` to log errors for WebSocket closures due to invalid payloads. - Updated `MultiModalLiveClient` to log close status descriptions for invalid payloads. - Added `OutputAudioTranscription` and `InputAudioTranscription` properties to `BidiGenerateContentSetup`, initialized with new instances. - Created a new class `OutputAudioTranscription` for handling audio transcription data.
WalkthroughA new logging method for handling invalid WebSocket payloads was introduced, along with its integration into the WebSocket client's disconnection logic. Additionally, the Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant MultiModalLiveClient
participant Logger
Client->>MultiModalLiveClient: ConnectAsync()
MultiModalLiveClient-->>Client: Establish WebSocket connection
Note over MultiModalLiveClient: WebSocket disconnects
MultiModalLiveClient->>MultiModalLiveClient: Check close status
alt close status is InvalidPayloadData
MultiModalLiveClient->>Logger: LogConnectionClosedWithInvalidPyload(closeStatusDescription)
else other close status
MultiModalLiveClient->>Logger: Log generic connection closed
end
MultiModalLiveClient-->>Client: Disconnected event
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (2)
src/GenerativeAI.Live/Models/MultiModalLiveClient.cs (1)
434-434: Remove redundant comment.The comment
//log info.CloseStatusDescriptionis redundant since the code below clearly shows what's being logged.- //log info.CloseStatusDescriptionsrc/GenerativeAI/Types/MultimodalLive/BidiGenerateContentSetup.cs (1)
53-56: Implement the OutputAudioTranscription class.The
OutputAudioTranscriptionclass is currently empty, which appears to be a placeholder. Please implement the required properties based on the API specification.Do you want me to help generate the implementation based on the Gemini API documentation for audio transcription properties?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/GenerativeAI.Live/Logging/LoggingExtensions.cs(1 hunks)src/GenerativeAI.Live/Models/MultiModalLiveClient.cs(1 hunks)src/GenerativeAI/Types/MultimodalLive/BidiGenerateContentSetup.cs(1 hunks)
🔇 Additional comments (1)
src/GenerativeAI/Types/MultimodalLive/BidiGenerateContentSetup.cs (1)
48-48: Consider the default initialization inconsistency.
OutputAudioTranscriptionis initialized with a new instance by default, whileInputAudioTranscriptionis not. This inconsistency might be intentional, but please verify if both should have the same initialization behavior.
| [LoggerMessage(EventId = 113, Level = LogLevel.Error, Message = "WebSocket connection closed caused by invalid payload: {CloseStatusDescription}")] | ||
| public static partial void LogConnectionClosedWithInvalidPyload(this ILogger logger, string closeStatusDescription); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the typo in the method name.
The method name LogConnectionClosedWithInvalidPyload has a typo - it should be LogConnectionClosedWithInvalidPayload (missing 'a' in "Payload").
Apply this diff to fix the typo:
- [LoggerMessage(EventId = 113, Level = LogLevel.Error, Message = "WebSocket connection closed caused by invalid payload: {CloseStatusDescription}")]
- public static partial void LogConnectionClosedWithInvalidPyload(this ILogger logger, string closeStatusDescription);
+ [LoggerMessage(EventId = 113, Level = LogLevel.Error, Message = "WebSocket connection closed caused by invalid payload: {CloseStatusDescription}")]
+ public static partial void LogConnectionClosedWithInvalidPayload(this ILogger logger, string closeStatusDescription);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| [LoggerMessage(EventId = 113, Level = LogLevel.Error, Message = "WebSocket connection closed caused by invalid payload: {CloseStatusDescription}")] | |
| public static partial void LogConnectionClosedWithInvalidPyload(this ILogger logger, string closeStatusDescription); | |
| [LoggerMessage(EventId = 113, Level = LogLevel.Error, Message = "WebSocket connection closed caused by invalid payload: {CloseStatusDescription}")] | |
| public static partial void LogConnectionClosedWithInvalidPayload(this ILogger logger, string closeStatusDescription); |
🤖 Prompt for AI Agents
In src/GenerativeAI.Live/Logging/LoggingExtensions.cs at lines 51 to 52, the
method name LogConnectionClosedWithInvalidPyload contains a typo. Rename the
method to LogConnectionClosedWithInvalidPayload by adding the missing 'a' in
"Payload" to correct the spelling.
| else if (info.CloseStatus == WebSocketCloseStatus.InvalidPayloadData) | ||
| { | ||
| //log info.CloseStatusDescription | ||
| _logger?.LogConnectionClosedWithInvalidPyload(info.CloseStatusDescription!); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Fix the typo and verify null safety.
Two issues in this code segment:
- Same typo as in the logging extensions:
LogConnectionClosedWithInvalidPyloadshould beLogConnectionClosedWithInvalidPayload - The null-forgiving operator (
!) onCloseStatusDescriptioncould be risky if the value is actually null
Apply this diff to fix the typo:
- _logger?.LogConnectionClosedWithInvalidPyload(info.CloseStatusDescription!);
+ _logger?.LogConnectionClosedWithInvalidPayload(info.CloseStatusDescription!);Verify that CloseStatusDescription is never null when CloseStatus is InvalidPayloadData:
🏁 Script executed:
#!/bin/bash
# Description: Check if CloseStatusDescription can be null for InvalidPayloadData scenarios
# Search for WebSocketCloseStatus.InvalidPayloadData usage patterns
rg -A 5 -B 5 "InvalidPayloadData"
# Look for any null checks on CloseStatusDescription
rg -A 3 -B 3 "CloseStatusDescription"Length of output: 2680
Fix typo, handle possible null, and update extension method
Two issues found in InvalidPayloadData branch:
- Typo in the logging call and its extension:
InvalidPyload→InvalidPayload - Unsafe null-forgiving operator on
info.CloseStatusDescription
Please update both call site and extension definition, and guard against a null description. Example diff:
In src/GenerativeAI.Live/Models/MultiModalLiveClient.cs:
- else if (info.CloseStatus == WebSocketCloseStatus.InvalidPayloadData)
- {
- //log info.CloseStatusDescription
- _logger?.LogConnectionClosedWithInvalidPyload(info.CloseStatusDescription!);
- }
+ else if (info.CloseStatus == WebSocketCloseStatus.InvalidPayloadData)
+ {
+ // ensure description is never null
+ var description = info.CloseStatusDescription ?? "No payload description";
+ _logger?.LogConnectionClosedWithInvalidPayload(description);
+ }In src/GenerativeAI.Live/Logging/LoggingExtensions.cs:
- [LoggerMessage(EventId = 113, Level = LogLevel.Error,
- Message = "WebSocket connection closed caused by invalid payload: {CloseStatusDescription}")]
- public static partial void LogConnectionClosedWithInvalidPyload(this ILogger logger, string closeStatusDescription);
+ [LoggerMessage(EventId = 113, Level = LogLevel.Error,
+ Message = "WebSocket connection closed caused by invalid payload: {CloseStatusDescription}")]
+ public static partial void LogConnectionClosedWithInvalidPayload(this ILogger logger, string closeStatusDescription);- Rename both call and extension to “InvalidPayload”
- Replace
!with??or another null‐safe default - (Optional) Verify whether a
Disconnectedevent should also be raised for this case per your connection lifecycle
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| else if (info.CloseStatus == WebSocketCloseStatus.InvalidPayloadData) | |
| { | |
| //log info.CloseStatusDescription | |
| _logger?.LogConnectionClosedWithInvalidPyload(info.CloseStatusDescription!); | |
| } | |
| else if (info.CloseStatus == WebSocketCloseStatus.InvalidPayloadData) | |
| { | |
| // ensure description is never null | |
| var description = info.CloseStatusDescription ?? "No payload description"; | |
| _logger?.LogConnectionClosedWithInvalidPayload(description); | |
| } |
🤖 Prompt for AI Agents
In src/GenerativeAI.Live/Models/MultiModalLiveClient.cs around lines 432 to 436,
fix the typo in the logging call from LogConnectionClosedWithInvalidPyload to
LogConnectionClosedWithInvalidPayload. Remove the null-forgiving operator on
info.CloseStatusDescription and instead provide a null-safe fallback value
(e.g., an empty string or a default message) to avoid potential null reference
exceptions. Also update the corresponding extension method in
src/GenerativeAI.Live/Logging/LoggingExtensions.cs to rename the method
accordingly and handle null inputs safely.
| [JsonPropertyName("outputAudioTranscription")] | ||
| public OutputAudioTranscription? OutputAudioTranscription { get; set; } = new OutputAudioTranscription(); | ||
|
|
||
| [JsonPropertyName("inputAudioTranscription")] | ||
| public OutputAudioTranscription? InputAudioTranscription { get; set; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Reconsider the type design for input and output transcription.
Both OutputAudioTranscription and InputAudioTranscription properties use the same OutputAudioTranscription type. This seems inconsistent and potentially confusing. Consider:
- Creating separate types for input and output transcription if they have different structures
- Using a more generic type name like
AudioTranscriptionif they are truly identical - Adding XML documentation to clarify the purpose and differences between these properties
Verify the intended design by checking the API documentation or related usage:
🌐 Web query:
What are the differences between inputAudioTranscription and outputAudioTranscription in the Gemini Multimodal Live API setup configuration?
💡 Result:
Differences Between inputAudioTranscription and outputAudioTranscription in the Gemini Multimodal Live API
The Gemini Multimodal Live API provides configuration options for real-time transcription of both incoming and outgoing audio. In the session setup, two distinct configuration fields manage these behaviors: inputAudioTranscription and outputAudioTranscription. Here are their key differences:
inputAudioTranscription
- Purpose: Enables transcription of audio input sent from the client (usually the user's speech or audio data).
- Function: When set, the API automatically processes and transcribes the incoming audio stream into text that aligns with the language of the input audio.
- Use Case: Useful for capturing and understanding what the user says in real time; for example, to display live captions of user speech or for further processing by the model.
- Language Alignment: The transcription is performed in the language detected or specified for the input audio[1][5].
outputAudioTranscription
- Purpose: Enables transcription of audio output generated by the model (the assistant's spoken responses).
- Function: When enabled, the API provides a text transcription of the audio that the model generates as its response, aligning with the output audio's language code.
- Use Case: Useful for displaying the assistant's spoken responses as text, offering accessibility for users who need visual confirmation, or enabling further text-based processing.
- Language Alignment: The transcription matches the language set for the model's audio output, which can be explicitly defined in the output configuration[1][5].
Summary Table
| Feature | inputAudioTranscription | outputAudioTranscription |
|---|---|---|
| Transcribes | Incoming (user) audio | Outgoing (assistant/model) audio |
| Text aligns with | Input audio language | Output audio language code |
| Use case | Capture/understand user speech in real time | Display/understand model's spoken responses in text |
| Activated by | Including input_audio_transcription in setup |
Including output_audio_transcription in setup |
Typical Use in Configuration
In a typical session configuration, both can be enabled simultaneously for full conversational visibility:
config = {
"response_modalities": ["AUDIO"],
"input_audio_transcription": {},
"output_audio_transcription": {}
}- This setup will provide both the user's spoken input and the assistant's spoken output as synchronized text streams, facilitating live transcription on both ends[5].
Practical Example
- When a user speaks,
inputAudioTranscriptionprovides a real-time transcript of their speech. - When the model replies via synthesized speech,
outputAudioTranscriptiondelivers the text equivalent of that synthesized audio.
Summary
inputAudioTranscription: Transcribes and exposes the user's input speech.outputAudioTranscription: Transcribes and exposes the model's spoken output, strictly following the language code specified for output.
This dual transcription capability ensures clear, accessible, and auditable interactions in voice or multimodal applications using the Gemini Multimodal Live API[1][5].
Citations:
- 1: https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/multimodal-live
- 2: https://ai.google.dev/gemini-api/docs/live
- 3: https://docs.pipecat.ai/server/services/s2s/gemini
- 4: https://ai.google.dev/gemini-api/docs/audio
- 5: https://cloud.google.com/vertex-ai/generative-ai/docs/live-api
- 6: https://www.youtube.com/watch?v=LMhe2egLsrQ
- 7: https://www.youtube.com/watch?v=1eybCQYOJnY
- 8: https://www.googlecloudcommunity.com/gc/Gemini-Code-Assist/Summary-after-audio-conversation-in-Gemini-s-multimodal-live-API/m-p/854353
- 9: https://www.youtube.com/watch?v=MGHwe7j2TyI
- 10: https://firebase.google.com/docs/ai-logic/analyze-audio
Refactor audio‐transcription types and property initialization
Both input and output transcription share the same (currently empty) configuration shape, so the OutputAudioTranscription type name is misleading and the two properties shouldn’t be initialized inconsistently. Please:
- Rename the placeholder class to a more generic name, e.g.:
/// <summary> /// Configuration for transcribing audio streams. /// </summary> public class AudioTranscription { }
- Update your properties in
BidiGenerateContentSetup.cs(around lines 47–51) to use this new type and remove the default instantiation:- [JsonPropertyName("outputAudioTranscription")] - public OutputAudioTranscription? OutputAudioTranscription { get; set; } = new OutputAudioTranscription(); + /// <summary> + /// Transcribes the assistant’s outgoing audio into text. + /// </summary> + [JsonPropertyName("outputAudioTranscription")] + public AudioTranscription? OutputAudioTranscription { get; set; } - [JsonPropertyName("inputAudioTranscription")] - public OutputAudioTranscription? InputAudioTranscription { get; set; } + /// <summary> + /// Transcribes the user’s incoming audio into text. + /// </summary> + [JsonPropertyName("inputAudioTranscription")] + public AudioTranscription? InputAudioTranscription { get; set; }
- Add XML documentation on both the class and its properties to clarify their distinct roles.
These changes will make the intent clear, avoid confusion over type reuse, and enforce explicit configuration for each transcription stream.
🤖 Prompt for AI Agents
In src/GenerativeAI/Types/MultimodalLive/BidiGenerateContentSetup.cs around
lines 47 to 51, rename the OutputAudioTranscription class to a more generic name
like AudioTranscription to reflect its shared configuration role. Update both
InputAudioTranscription and OutputAudioTranscription properties to use this new
AudioTranscription type and remove the default initialization on
OutputAudioTranscription to keep consistency. Add XML documentation comments on
the AudioTranscription class and on both properties to clearly describe their
distinct purposes for input and output audio transcription configurations.
LogConnectionClosedWithInvalidPyloadmethod inMultiModalLiveClientLoggingExtensionsto log errors for WebSocket closures due to invalid payloads.MultiModalLiveClientto log close status descriptions for invalid payloads.OutputAudioTranscriptionandInputAudioTranscriptionproperties toBidiGenerateContentSetup, initialized with new instances.OutputAudioTranscriptionfor handling audio transcription data.Summary by CodeRabbit