-
Notifications
You must be signed in to change notification settings - Fork 13
Refactor LLM chats: separate streaming logic and enforce strict typing #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: ci
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the refactoring, which is really helpful and clearer! Left some questions/comments to understand more.
We can keep iterating it.
| """ | ||
| Defines a chat agent that interacts with a Large Language Model (LLM). | ||
| Design Note: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: shall we update here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea, I will get back to that once the internal api is more stable
|
|
||
|
|
||
| @dataclasses.dataclass | ||
| class FunctionCall: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we also consider thought signature for reasoning models? Context here. Or it's already in "LLMMessage.thinking"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love to have it as well, perhaps in a separate PR as this one is already overcomplicated
| temperature: float | None = 0, | ||
| seed: int = 0, | ||
| tools: list[Any] | None = None, | ||
| ) -> LLMMessage[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Being explicit is nice! Wonder if we also want to keep the flexibility for other parameters, e.g., the config
| return result | ||
|
|
||
|
|
||
| class ModelProxyOpenAI(OpenAI): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still support streaming output for Model Proxy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should, but I would rather wait till they fully support responses API so we don't have to implement it twice
Major refactor of the LLM chat architecture to improve code organization, maintainability, and type safety. Key Changes: - Split `LLMChat` subclasses into distinct Non-Streaming and Streaming implementations. Streaming logic (primarily for notebooks) was complicating the core classes; this split makes primary actors more concise and less error-prone. - Moved provider-specific implementations into separate files: `openai.py` and `genai.py`. - Replaced the generic `LLMResponse` with a strictly typed version, specifically enforcing types for `tool_usage` and `token_usage`. - Updated `invoke` method to accept explicit arguments. - Migrated OpenAI integration from the `completion` API to the more user-friendly `responses` API. Testing: - Added coverage for common use cases using real APIs (tests run conditionally if environment keys are present).
|
|
||
| answer = response.content | ||
| response._meta.update(chat=chat, schema=schema, raw_content=answer, **kwargs) | ||
| response._meta.update( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still use _meta?
develra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - it's a bit hard for me to tell as a person pretty ignorant of this code if the test coverage is sufficient to be confident that these changes are safe. I think that it would be good to think through what might break as a result of these changes and make sure we have test coverage for it - especially given the somewhat sensitive timing of a new launch.
| args: ["run", "--group", "test", "pytest", "tests"] | ||
| waitFor: ["push-image"] | ||
| env: | ||
| - "LLM_DEFAULT=x" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it really doesn't matter but wondering why the MODEL_PROXY_* ones dropped the letter but LLM_DEFAULT didn't?
| arguments=json.loads(item.arguments), | ||
| output=output, | ||
| ) | ||
| # {"name": item.name, "arguments": item.arguments, "output": output} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - clean up if this is unused
| { | ||
| "role": message.sender.role | ||
| if message.sender.role != "tool" | ||
| else "system", # TODO: Remove this renaming once ModelProxy supports tools |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this TODO - do we know if that is on the roadmap for ModelProxy?
| if not self.support_structured_outputs and schema_instructions: | ||
| raw_messages.append( | ||
| { | ||
| "role": "system", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some models don't support system role, shall we use user?
| temperature: float | None = 0, | ||
| seed: int = 0, | ||
| tools: list[Any] | None = None, | ||
| ) -> LLMMessage[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For invoke, shall we return LLMMessage[T] for image output etc?
| tool_calls: list[FunctionCall] | None = None | ||
| usage: Usage | None = None | ||
|
|
||
| def add_chunk(self, chunk: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious what's the different uses of this new method and Message.stream?
|
|
||
| @dataclasses.dataclass | ||
| class LLMMessage(messages.Message[T]): | ||
| content: T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look at the add_chunk method, do we essentially assume content is always string? If so shall we just rename it to text etc?
We can add another filed as image for image output in the future?
| content: T | ||
| _status: utils.Status = utils.Status.RUNNING | ||
| thinking: str | None = None | ||
| tool_calls: list[FunctionCall] | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these fields are also useful to Message, do you think so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed with @s-alexey that it will be great for us to test more existing examples to avoid back-incompatibility.
Major refactor of the LLM chat architecture to improve code organization,
maintainability, and type safety.
Key Changes:
LLMChatsubclasses into distinct Non-Streaming and Streamingimplementations. Streaming logic (primarily for notebooks) was
complicating the core classes; this split makes primary actors more
concise and less error-prone.
openai.pyandgenai.py.LLMResponsewith a strictly typed version,specifically enforcing types for
tool_usageandtoken_usage.invokemethod to accept explicit arguments.completionAPI to the moreuser-friendly
responsesAPI.Testing:
conditionally if environment keys are present).