-
Notifications
You must be signed in to change notification settings - Fork 83
Add hint tuning tool to evaluate best actions at a given step #285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review by Korbit AI
Korbit automatically attempts to detect when you fix issues in new commits.
Category | Issue | Status |
---|---|---|
Sequential Model Queries ▹ view | ||
Hardcoded LLM Provider Configuration ▹ view | ||
Fragile Action Extraction Logic ▹ view |
Files scanned
File Path | Reviewed |
---|---|
src/agentlab/analyze/agent_xray.py | ✅ |
Explore our documentation to understand the languages and file types we support and the files we ignore.
Check out our docs on how you can make Korbit work best for you and your team.
for _ in range(num_queries): | ||
answer = client(chat_messages) | ||
content = answer.get("content", "") | ||
answers.append(content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sequential Model Queries 
Tell me more
What is the issue?
Sequential model queries are being made in a loop, causing unnecessary latency accumulation.
Why this matters
Making sequential API calls significantly increases total response time as each request must wait for the previous one to complete. With multiple queries, this creates a substantial performance bottleneck.
Suggested change ∙ Feature Preview
Parallelize the model queries using async/await pattern or concurrent.futures to make multiple requests simultaneously:
async def get_model_responses(client, chat_messages, num_queries):
async with asyncio.TaskGroup() as tg:
tasks = [
tg.create_task(client.acall(chat_messages))
for _ in range(num_queries)
]
return [t.result() for t in tasks]
Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
@@ -858,14 +881,102 @@ | |||
else: | |||
raise ValueError("Chat messages should be a list of BaseMessage or dict") | |||
|
|||
client = OpenAI() | |||
client = AzureChatModel(model_name="gpt-35-turbo", deployment_name="gpt-35-turbo") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoded LLM Provider Configuration 
Tell me more
What is the issue?
The code hardcodes the Azure model configuration without providing flexibility for different model providers or configurations.
Why this matters
Users won't be able to use different LLM providers or models for hint tuning, which limits the tool's applicability and contradicts the developer's intent of providing a systematic approach for evaluation.
Suggested change ∙ Feature Preview
Create a configurable model provider system:
def get_model_client(provider="azure", model_name="gpt-35-turbo"):
if provider == "azure":
return AzureChatModel(model_name=model_name, deployment_name=model_name)
elif provider == "openai":
return OpenAIChatModel(model_name=model_name)
elif provider == "openrouter":
return OpenRouterChatModel(model_name=model_name)
else:
raise ValueError(f"Unsupported provider: {provider}")
# Usage in submit_action:
client = get_model_client(provider="azure", model_name="gpt-35-turbo")
Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
answers.append(content) | ||
|
||
# Extract action part using regex | ||
action_match = re.search(r'<action>(.*?)</action>', content, re.DOTALL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fragile Action Extraction Logic 
Tell me more
What is the issue?
The code assumes actions are always wrapped in tags, but doesn't handle cases where the model output might be malformed or use different formats.
Why this matters
The action extraction could fail silently when the model output doesn't follow the expected format, leading to incomplete or misleading action distribution analysis.
Suggested change ∙ Feature Preview
Add robust action extraction with error handling:
def extract_action(content: str) -> str | None:
# Try XML-style tags
action_match = re.search(r'<action>(.*?)</action>', content, re.DOTALL)
if action_match:
return action_match.group(1).strip()
# Try markdown-style formatting
action_match = re.search(r'`action: (.*?)`', content, re.DOTALL)
if action_match:
return action_match.group(1).strip()
# Try plain text format
action_match = re.search(r'Action: (.*?)(?:
|$)', content)
if action_match:
return action_match.group(1).strip()
return None
# Usage in loop:
action = extract_action(content)
if action:
actions.append(action)
else:
print(f"Warning: Could not extract action from response: {content[:100]}...")
Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
Description by Korbit AI
What change is being made?
Add a hint tuning tool to evaluate and visualize the best actions at a given step by querying the language model multiple times, extracting actions, and displaying them graphically.
Why are these changes being made?
This feature enhancement allows users to better understand and compare the predictive actions suggested by the model against the current step's action by evaluating multiple predictions and displaying the action distribution graphically. It aids in fine-tuning and analyzing the model's responses for improved decision-making accuracy. The integration of a standard plotting approach and new GUI elements ensures the feature is user-friendly and informative.