Skip to content

Comments

conversation.item.done event implementation#451

Merged
KaQuMiQ merged 2 commits intomainfrom
feature/conversation_item_done_event
Oct 29, 2025
Merged

conversation.item.done event implementation#451
KaQuMiQ merged 2 commits intomainfrom
feature/conversation_item_done_event

Conversation

@marcinwolder-miquido
Copy link
Collaborator

Description

Aligned realtime memory updates with OpenAI’s event flow so we only store finalized assistant outputs and capture richer payloads.

  • Treat conversation.item.done as the authoritative signal for assistant completions; skip incomplete
    items instead of re-fetching prematurely.
  • Keep conversation.item.retrieved for user messages and continue refetching until completion.
  • Persist assistant audio and transcript parts by enhancing _assistant_content_to_multimodal to
    append audio blobs (with the correct MIME type) and transcript text tagged via metadata.

@coderabbitai
Copy link

coderabbitai bot commented Oct 20, 2025

Walkthrough

The change updates OpenAI Realtime event handling and multimodal content assembly in src/draive/openai/realtime.py. It adds a handler for "conversation.item.done" that processes only items with an id and type "message", and triggers retrieval for assistant messages. The existing "conversation.item.retrieved" path is retained and clarified to operate only on messages, with logic to request completion when an item is not completed. Memory updates for user and assistant items now use multimodal builders; _assistant_content_to_multimodal was extended to include audio (ResourceContent with mime_type) and transcript (TextContent with transcript meta) parts alongside text.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

The diff is limited to a single file but contains multiple related changes: added event branch, adjusted retrieval/ retry flow, and expanded multimodal content assembly. Review requires checking conditional flows, memory updates, and new content part construction.

Possibly related PRs

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The PR title "conversation.item.done event implementation" directly refers to a significant change in the pull request. The raw summary confirms that conversation.item.done event handling was added to process assistant messages, and the PR objectives identify treating this event as the authoritative signal for assistant completions as a key goal. However, the changeset encompasses broader objectives including multimodal content expansion, memory update adjustments, and retrieval flow modifications. The title captures one important aspect of the implementation but does not reflect the full scope of "aligning realtime memory updates with OpenAI's event flow," which is the stated primary objective.
Description Check ✅ Passed The PR description is clearly related to the changeset. It identifies the core objective of aligning realtime memory updates with OpenAI's event flow and directly maps to the changes described in the raw summary: handling conversation.item.done as an authoritative signal, retaining conversation.item.retrieved for user messages, and enhancing multimodal builders to support audio and transcript parts. The description provides specific, actionable details about what was changed and why, with no vague or generic language that would obscure the intent.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/conversation_item_done_event

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/draive/openai/realtime.py (1)

344-344: Consider optimizing retrieve calls for user items only.

Since conversation.item.retrieved now only handles user messages (line 398), and assistant messages use conversation.item.done instead, calling retrieve for assistant items at line 344 generates events that are immediately discarded at line 417-418.

Optimization:

-                            # request the full item to be stored in the memory
-                            await connection.conversation.item.retrieve(item_id=event.item.id)
+                            # request the full item to be stored in the memory (user items only)
+                            if event.item.role == "user":
+                                await connection.conversation.item.retrieve(item_id=event.item.id)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5675e21 and 4a254a8.

📒 Files selected for processing (1)
  • src/draive/openai/realtime.py (3 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use Python 3.12+ features and syntax across the codebase
Format code exclusively with Ruff (make format); do not use other formatters
Skip module-level docstrings

Files:

  • src/draive/openai/realtime.py
src/draive/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/draive/**/*.py: Import Haiway symbols directly (from haiway import State, ctx)
Use ctx.scope(...) to bind scoped Disposables and active State; avoid global state
Route all logs through ctx.log_debug/info/warn/error; do not use print
Use latest, most strict typing syntax (Python 3.12+), with strict typing only for public APIs
Avoid loose Any except at explicit third‑party boundaries
Prefer explicit attribute access with static types; avoid dynamic getattr except at narrow boundaries
Prefer Mapping/Sequence/Iterable in public types over dict/list/set
Use final where applicable; avoid inheritance and prefer composition
Use precise unions (|) and narrow with match/isinstance; avoid cast unless provably safe and localized
Model immutable data/config and facades with haiway.State; provide ergonomic classmethods like .of(...)
Avoid in-place mutation; use State.updated(...) or functional builders to produce new instances
Access active state via haiway.ctx inside async scopes (ctx.scope(...))
Use @statemethod for public state methods that dispatch on the active instance
Log around generation calls, tool dispatch, and provider requests/responses without leaking secrets; prefer structured/concise messages
Add metrics via ctx.record where applicable
All I/O is async; keep boundaries async and use ctx.spawn for detached tasks
Use structured concurrency and valid coroutine usage; rely on haiway/asyncio; avoid custom threading
Construct multimodal content with MultimodalContent.of(...) and compose blocks explicitly
Use ResourceContent/ResourceReference for media/data blobs
Wrap custom types/data within ArtifactContent; use hidden when needed
Add NumPy-style docstrings for public symbols with Parameters/Returns/Raises and rationale when non-obvious
Avoid docstrings on internal helpers; keep names self-explanatory
Keep docstrings high-quality; mkdocstrings pulls them into API reference
Never log secrets or full request bodies containing keys/tokens

Files:

  • src/draive/openai/realtime.py
src/draive/{openai,anthropic,mistral,gemini,vllm,ollama,bedrock,cohere}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/draive/{openai,anthropic,mistral,gemini,vllm,ollama,bedrock,cohere}/**/*.py: Provider-specific feature modules live under their respective provider directories
Translate provider/SDK errors into typed exceptions; do not raise bare Exception and preserve context
Use environment variables for credentials and resolve via helper functions like getenv_str

Files:

  • src/draive/openai/realtime.py
🧬 Code graph analysis (1)
src/draive/openai/realtime.py (4)
src/draive/models/types.py (14)
  • content (474-478)
  • content (618-622)
  • ModelOutput (573-672)
  • of (272-294)
  • of (315-347)
  • of (375-411)
  • of (444-467)
  • of (534-558)
  • of (587-611)
  • of (735-761)
  • of (790-816)
  • of (843-870)
  • of (935-963)
  • ModelInput (429-501)
src/draive/multimodal/content.py (1)
  • audio (161-176)
src/draive/resources/types.py (1)
  • ResourceContent (126-212)
src/draive/multimodal/text.py (1)
  • TextContent (11-82)
🔇 Additional comments (3)
src/draive/openai/realtime.py (3)

362-379: LGTM! Assistant message handling is correct.

The switch to ModelOutput.of() with _assistant_content_to_multimodal properly captures assistant completions as outputs. Error handling preserves context by re-raising after logging.


384-418: LGTM! User message retrieval flow is correct.

The new handler properly uses ModelInput.of() with _user_content_to_multimodal for user messages, and the retry logic at lines 392-394 ensures incomplete items are re-fetched.


862-876: LGTM! Audio and transcript handling implemented correctly.

The new audio and transcript processing correctly mirrors the pattern in _user_content_to_multimodal (lines 833-847), using ResourceContent.of() for audio blobs with proper MIME types and TextContent.of() for transcripts with metadata tagging.


# Only record completed items, otherwise request once more
if event.item.status != "completed":
await connection.conversation.item.retrieve(item_id=event.item.id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we request retrieve on incomplete? also, is it possible to receive incomplete on "done" event?

@marcinwolder-miquido marcinwolder-miquido force-pushed the feature/conversation_item_done_event branch from 4a254a8 to 0d36d6e Compare October 23, 2025 15:34
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/draive/openai/realtime.py (3)

827-833: Apply the same audio fix to user content builder for consistency.

User audio is handled the same way; avoid double base64 here too.

-        if part.audio:
-            parts.append(
-                ResourceContent.of(
-                    part.audio,
-                    mime_type=audio_format,
-                )
-            )
+        if part.audio:
+            audio_bytes: bytes
+            if isinstance(part.audio, str):
+                try:
+                    audio_bytes = urlsafe_b64decode(part.audio)
+                except Exception:
+                    audio_bytes = b64decode(part.audio)
+            else:
+                audio_bytes = part.audio
+            parts.append(ResourceContent.of(audio_bytes, mime_type=audio_format))

585-589: Dict mutated during iteration; iterate over a snapshot to avoid RuntimeError.

Current code deletes keys while iterating current_items.keys().

-    for item_id in current_items.keys():
+    for item_id in list(current_items):
         try:
             await connection.conversation.item.delete(item_id=item_id)
             del current_items[item_id]

293-296: Nit: fix comment typo.

“speach” → “speech”.

-                            # send event that VAD detected input speach
+                            # send event that VAD detected input speech
♻️ Duplicate comments (1)
src/draive/openai/realtime.py (1)

371-371: Replace assertion with graceful handling (duplicate of prior review).

Skip/log empty content instead of raising AssertionError in production.

-                            assert event.item.content  # nosec: B101
+                            if not event.item.content:
+                                ctx.log_warn(
+                                    "Skipping completed item without content",
+                                    item_id=event.item.id,
+                                )
+                                continue
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4a254a8 and 0d36d6e.

📒 Files selected for processing (1)
  • src/draive/openai/realtime.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use Python 3.12+ features and syntax across the codebase
Format code exclusively with Ruff (make format); do not use other formatters
Skip module-level docstrings

Files:

  • src/draive/openai/realtime.py
src/draive/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/draive/**/*.py: Import Haiway symbols directly (from haiway import State, ctx)
Use ctx.scope(...) to bind scoped Disposables and active State; avoid global state
Route all logs through ctx.log_debug/info/warn/error; do not use print
Use latest, most strict typing syntax (Python 3.12+), with strict typing only for public APIs
Avoid loose Any except at explicit third‑party boundaries
Prefer explicit attribute access with static types; avoid dynamic getattr except at narrow boundaries
Prefer Mapping/Sequence/Iterable in public types over dict/list/set
Use final where applicable; avoid inheritance and prefer composition
Use precise unions (|) and narrow with match/isinstance; avoid cast unless provably safe and localized
Model immutable data/config and facades with haiway.State; provide ergonomic classmethods like .of(...)
Avoid in-place mutation; use State.updated(...) or functional builders to produce new instances
Access active state via haiway.ctx inside async scopes (ctx.scope(...))
Use @statemethod for public state methods that dispatch on the active instance
Log around generation calls, tool dispatch, and provider requests/responses without leaking secrets; prefer structured/concise messages
Add metrics via ctx.record where applicable
All I/O is async; keep boundaries async and use ctx.spawn for detached tasks
Use structured concurrency and valid coroutine usage; rely on haiway/asyncio; avoid custom threading
Construct multimodal content with MultimodalContent.of(...) and compose blocks explicitly
Use ResourceContent/ResourceReference for media/data blobs
Wrap custom types/data within ArtifactContent; use hidden when needed
Add NumPy-style docstrings for public symbols with Parameters/Returns/Raises and rationale when non-obvious
Avoid docstrings on internal helpers; keep names self-explanatory
Keep docstrings high-quality; mkdocstrings pulls them into API reference
Never log secrets or full request bodies containing keys/tokens

Files:

  • src/draive/openai/realtime.py
src/draive/{openai,anthropic,mistral,gemini,vllm,ollama,bedrock,cohere}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/draive/{openai,anthropic,mistral,gemini,vllm,ollama,bedrock,cohere}/**/*.py: Provider-specific feature modules live under their respective provider directories
Translate provider/SDK errors into typed exceptions; do not raise bare Exception and preserve context
Use environment variables for credentials and resolve via helper functions like getenv_str

Files:

  • src/draive/openai/realtime.py
🧬 Code graph analysis (1)
src/draive/openai/realtime.py (3)
src/draive/multimodal/content.py (1)
  • audio (161-176)
src/draive/resources/types.py (1)
  • ResourceContent (126-212)
src/draive/multimodal/text.py (1)
  • TextContent (11-82)
🔇 Additional comments (1)
src/draive/openai/realtime.py (1)

349-358: Remove the defensive check; the code is correct as-is.

OpenAI Realtime documentation confirms that conversation.item.done is emitted after an item reaches "completed" status and signals the item finished processing. The unconditional retrieve() call is the correct behavior and does not require a status guard. The original review suggestion was overly cautious.

Comment on lines +856 to +871
if part.audio:
parts.append(
ResourceContent.of(
part.audio,
mime_type=audio_format,
)
)

if part.transcript:
parts.append(
TextContent.of(
part.transcript,
meta={"transcript": True},
)
)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Assistant audio persistence likely double-encodes; decode to bytes before ResourceContent.of.

ResourceContent.of(str) base64‑encodes the string; if part.audio is base64, this corrupts audio. Decode to bytes first.

-        if part.audio:
-            parts.append(
-                ResourceContent.of(
-                    part.audio,
-                    mime_type=audio_format,
-                )
-            )
+        if part.audio:
+            # Normalize to raw bytes (handle base64 str or bytes)
+            audio_bytes: bytes
+            if isinstance(part.audio, str):
+                try:
+                    audio_bytes = urlsafe_b64decode(part.audio)
+                except Exception:
+                    audio_bytes = b64decode(part.audio)
+            else:
+                audio_bytes = part.audio
+            parts.append(ResourceContent.of(audio_bytes, mime_type=audio_format))
 
-        if part.transcript:
+        if part.transcript:
             parts.append(
                 TextContent.of(
                     part.transcript,
                     meta={"transcript": True},
                 )
             )

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/draive/openai/realtime.py around lines 856 to 871, the code passes
part.audio (likely a base64 string) directly to ResourceContent.of which will
Base64-encode a string and corrupt the audio; update the code to ensure
part.audio is bytes before calling ResourceContent.of by detecting if part.audio
is a str and, if so, decoding it with base64.b64decode (otherwise pass bytes
through), then call ResourceContent.of with the resulting bytes and the same
mime_type.

@KaQuMiQ KaQuMiQ merged commit 20b87a4 into main Oct 29, 2025
3 checks passed
@KaQuMiQ KaQuMiQ deleted the feature/conversation_item_done_event branch October 29, 2025 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants