Skip to content

Conversation

xiaomingservicenow
Copy link
Collaborator

@xiaomingservicenow xiaomingservicenow commented Aug 27, 2025

  • Add a hint tuning tool in agentlab xray
    • Supports querying llm N number of times and display distribution of likely actions given editable prompts at a given step
    • Compares actions taken at experiments and most likely actions given by the tuning tool

Description by Korbit AI

What change is being made?

Add a hint tuning tool to evaluate and visualize the best actions at a given step by querying the language model multiple times, extracting actions, and displaying them graphically.

Why are these changes being made?

This feature enhancement allows users to better understand and compare the predictive actions suggested by the model against the current step's action by evaluating multiple predictions and displaying the action distribution graphically. It aids in fine-tuning and analyzing the model's responses for improved decision-making accuracy. The integration of a standard plotting approach and new GUI elements ensures the feature is user-friendly and informative.

Is this description stale? Ask me to generate a new description by commenting /korbit-generate-pr-description

Copy link

@korbit-ai korbit-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.
Category Issue Status
Performance Sequential Model Queries ▹ view
Design Hardcoded LLM Provider Configuration ▹ view
Error Handling Fragile Action Extraction Logic ▹ view
Files scanned
File Path Reviewed
src/agentlab/analyze/agent_xray.py

Explore our documentation to understand the languages and file types we support and the files we ignore.

Check out our docs on how you can make Korbit work best for you and your team.

Loving Korbit!? Share us on LinkedIn Reddit and X

Comment on lines +892 to +895
for _ in range(num_queries):
answer = client(chat_messages)
content = answer.get("content", "")
answers.append(content)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sequential Model Queries category Performance

Tell me more
What is the issue?

Sequential model queries are being made in a loop, causing unnecessary latency accumulation.

Why this matters

Making sequential API calls significantly increases total response time as each request must wait for the previous one to complete. With multiple queries, this creates a substantial performance bottleneck.

Suggested change ∙ Feature Preview

Parallelize the model queries using async/await pattern or concurrent.futures to make multiple requests simultaneously:

async def get_model_responses(client, chat_messages, num_queries):
    async with asyncio.TaskGroup() as tg:
        tasks = [
            tg.create_task(client.acall(chat_messages))
            for _ in range(num_queries)
        ]
    return [t.result() for t in tasks]
Provide feedback to improve future suggestions

Nice Catch Incorrect Not in Scope Not in coding standard Other

💬 Looking for more details? Reply to this comment to chat with Korbit.

@@ -858,14 +881,102 @@
else:
raise ValueError("Chat messages should be a list of BaseMessage or dict")

client = OpenAI()
client = AzureChatModel(model_name="gpt-35-turbo", deployment_name="gpt-35-turbo")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded LLM Provider Configuration category Design

Tell me more
What is the issue?

The code hardcodes the Azure model configuration without providing flexibility for different model providers or configurations.

Why this matters

Users won't be able to use different LLM providers or models for hint tuning, which limits the tool's applicability and contradicts the developer's intent of providing a systematic approach for evaluation.

Suggested change ∙ Feature Preview

Create a configurable model provider system:

def get_model_client(provider="azure", model_name="gpt-35-turbo"):
    if provider == "azure":
        return AzureChatModel(model_name=model_name, deployment_name=model_name)
    elif provider == "openai":
        return OpenAIChatModel(model_name=model_name)
    elif provider == "openrouter":
        return OpenRouterChatModel(model_name=model_name)
    else:
        raise ValueError(f"Unsupported provider: {provider}")

# Usage in submit_action:
client = get_model_client(provider="azure", model_name="gpt-35-turbo")
Provide feedback to improve future suggestions

Nice Catch Incorrect Not in Scope Not in coding standard Other

💬 Looking for more details? Reply to this comment to chat with Korbit.

answers.append(content)

# Extract action part using regex
action_match = re.search(r'<action>(.*?)</action>', content, re.DOTALL)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fragile Action Extraction Logic category Error Handling

Tell me more
What is the issue?

The code assumes actions are always wrapped in tags, but doesn't handle cases where the model output might be malformed or use different formats.

Why this matters

The action extraction could fail silently when the model output doesn't follow the expected format, leading to incomplete or misleading action distribution analysis.

Suggested change ∙ Feature Preview

Add robust action extraction with error handling:

def extract_action(content: str) -> str | None:
    # Try XML-style tags
    action_match = re.search(r'<action>(.*?)</action>', content, re.DOTALL)
    if action_match:
        return action_match.group(1).strip()
    
    # Try markdown-style formatting
    action_match = re.search(r'`action: (.*?)`', content, re.DOTALL)
    if action_match:
        return action_match.group(1).strip()
        
    # Try plain text format
    action_match = re.search(r'Action: (.*?)(?:
|$)', content)
    if action_match:
        return action_match.group(1).strip()
    
    return None

# Usage in loop:
action = extract_action(content)
if action:
    actions.append(action)
else:
    print(f"Warning: Could not extract action from response: {content[:100]}...")
Provide feedback to improve future suggestions

Nice Catch Incorrect Not in Scope Not in coding standard Other

💬 Looking for more details? Reply to this comment to chat with Korbit.

@xiaomingservicenow xiaomingservicenow marked this pull request as draft August 27, 2025 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant