-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: Add custom token_ids method #2264
Conversation
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
class BaseChatOpenAI(ChatOpenAI): | ||
usage_metadata: dict = {} | ||
custom_get_token_ids = custom_get_token_ids | ||
|
||
def get_last_generation_info(self) -> Optional[Dict[str, Any]]: | ||
return self.usage_metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are several points to address:
-
Function Overloading: In Python, function overloading is not directly supported like some other languages. However, you can achieve similar functionality using default arguments or decorators.
-
Custom Function Placement: The placement of
custom_get_token_ids
inside the class body makes it difficult to understand its logic and potential side effects. Consider moving the entire implementation of this method into a separate file if needed. -
Class Method Access: Direct access to instance methods from the base class (
TokenizerManage
) might violate encapsulation principles unless specifically allowed by design. If possible, consider making these methods accessible via instance attributes. -
Docstring Consistency: Ensure consistent use of docstrings across the codebase. It's important for others (or future you) to understand what each method does.
-
Usage Metadata Handling: The presence of
usage_metadata
suggests that the class might be used in contexts where request tracking is required. Verify there aren't any unhandled cases where requests don’t update or clear this metadata, which could lead to incorrect metrics or inconsistencies. -
Deprecation warnings: Using deprecated functions or modules should be discouraged since they may not receive updates or support in the future. Ensure all dependencies are up-to-date and compatible with your project's needs.
-
Testing Strategy: Implement comprehensive tests, particularly unit tests, to cover different scenarios and edge cases, including how
custom_get_token_ids
interacts with dependent methods.
Based on these considerations, here’s an improved version of your code:
from typing import Optional, Dict, Any
class BaseChatOpenAI:
usage_metadata: dict = {}
def __init__(self, custom_get_token_ids=None):
self.custom_get_token_ids = custom_get_token_ids
@classmethod
def set_custom_get_token_ids(cls, func):
cls.custom_get_token_ids = staticmethod(func)
def get_last_generation_info(self) -> Optional[Dict[str, Any]]:
return self.usage_metadata
# Example usage to register a custom function
@BaseChatOpenAI.set_custom_get_token_ids
def custom_get_token_ids(text: str):
# Implementation of token encoding
pass
# Instantiate the class with the custom get_token_ids function
chat_openai_instance = BaseChatOpenAI(custom_get_token_ids)
This refactored code ensures proper separation of concerns and adheres to Python naming conventions, while also providing guidance on best practices for maintaining robust and modular software.
perf: Add custom token_ids method