Skip to content

Conversation

@blefo
Copy link
Member

@blefo blefo commented Oct 2, 2025

This pull request fixes the streaming chat completion endpoint to include sources when web search is on. It also adds a unit test to verify that sources are properly included in the final streamed payload.

Streaming response improvements:

  • Refactored the streaming logic in chat_completion_stream_generator to buffer payloads, append sources only on the final chunk (when finish_reason == "stop"), and yield JSON-encoded SSE events; also improved token usage tracking and error handling in the stream. [1] [2] [3]

Testing enhancements:

  • Added a new unit test test_chat_completion_stream_includes_sources to verify that web search sources are only included in the final streamed payload, ensuring correct SSE formatting and payload structure.

Code cleanup:

  • Removed unused imports (asyncio) and added necessary ones (json) in both the main router and test files. [1] [2]

@blefo blefo requested a review from Copilot October 2, 2025 11:33
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request fixes a streaming chat completion endpoint to properly include web search sources in the final streamed payload when web search is enabled. The implementation ensures sources are only added to the last chunk with finish_reason "stop".

Key changes:

  • Refactored streaming logic to buffer payloads and append sources only on the final chunk
  • Added comprehensive unit test to verify sources are included correctly in streamed responses
  • Improved error handling and token usage tracking in the streaming generator

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
nilai-api/src/nilai_api/routers/private.py Refactored streaming logic to buffer payloads, add sources to final chunk, and improve error handling
tests/unit/nilai_api/routers/test_private.py Added unit test to verify sources are properly included in final streamed payload

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@blefo blefo requested a review from jcabrero October 2, 2025 11:35
Comment on lines 352 to 355
try:
stop_condition = payload["choices"][0].get("finish_reason")
except Exception:
stop_condition = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an important issue.

In most programming languages, try/except are expensive routines to run, and slow down most programs https://paltman.com/try-except-performance-in-python-a-simple-test/.
It is not a problem if you're executing it just once and if it normally goes through the "try" branch, but this is executed for every single token that is produced through the stream routine and it is executed through the except branch even more times. If this snippet slowed the program down by 1ms, it would mean for a 1000 token generation it would already eat up a second from that user.

So, easy option that works most times, use get("abc", None) + check if None.

Even easier option for this case. I see you're updating to if chunk.usage is not None. For the current setup, the chunk.usage is only reported on the last streamed chunk. This means that you can profit from chunk.usage not being None in that case to know it is the last request. At that point, you can reply with the sources.

Comment on lines 340 to 351
payload = chunk.model_dump(exclude_unset=True)

if chunk.usage is not None:
last_seen_usage = payload.get("usage", last_seen_usage)
if last_seen_usage:
prompt_token_usage = last_seen_usage.get(
"prompt_tokens", prompt_token_usage
)
completion_token_usage = last_seen_usage.get(
"completion_tokens", completion_token_usage
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This aligns with what I told you on Friday. You should not dump the model. The response is a Pydantic model from the OpenAI client. This enforces you to see if fields such as "choices", or. "finish_reason" exists. They are there. They may be None if they are optional, which you can easily check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usage parameter is used for the stop condition now. It's possible as I updated the continuous_usage_stats so that it only appears in the last chunk.
The pydantic object is directly used instead of using a dictionary

Copy link
Member

@jcabrero jcabrero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please have a look at the comments please

@blefo blefo force-pushed the fix-steram-includes-sources branch from b694367 to 5425fcc Compare October 6, 2025 13:06
@blefo blefo requested a review from jcabrero October 6, 2025 14:10
Copy link
Member

@jcabrero jcabrero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 LGTM

@blefo blefo merged commit 17fefa1 into main Oct 6, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants