Add hint tuning tool to evaluate best actions at a given step #285

xiaomingservicenow · 2025-08-27T18:52:35Z

Add a hint tuning tool in agentlab xray
- Supports querying llm N number of times and display distribution of likely actions given editable prompts at a given step
- Compares actions taken at experiments and most likely actions given by the tuning tool

Description by Korbit AI

What change is being made?

Add a hint tuning tool to evaluate and visualize the best actions at a given step by querying the language model multiple times, extracting actions, and displaying them graphically.

Why are these changes being made?

This feature enhancement allows users to better understand and compare the predictive actions suggested by the model against the current step's action by evaluating multiple predictions and displaying the action distribution graphically. It aids in fine-tuning and analyzing the model's responses for improved decision-making accuracy. The integration of a standard plotting approach and new GUI elements ensures the feature is user-friendly and informative.

Is this description stale? Ask me to generate a new description by commenting /korbit-generate-pr-description

korbit-ai

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.

Category	Issue	Status
	Sequential Model Queries ▹ view
	Hardcoded LLM Provider Configuration ▹ view
	Fragile Action Extraction Logic ▹ view

Files scanned

File Path	Reviewed
src/agentlab/analyze/agent_xray.py	✅

Explore our documentation to understand the languages and file types we support and the files we ignore.

Check out our docs on how you can make Korbit work best for you and your team.

Loving Korbit!? Share us on LinkedIn Reddit and X

korbit-ai · 2025-08-27T18:55:05Z

src/agentlab/analyze/agent_xray.py

+    for _ in range(num_queries):
+        answer = client(chat_messages)
+        content = answer.get("content", "")
+        answers.append(content)


Sequential Model Queries

Tell me more

What is the issue?

Sequential model queries are being made in a loop, causing unnecessary latency accumulation.

Why this matters

Making sequential API calls significantly increases total response time as each request must wait for the previous one to complete. With multiple queries, this creates a substantial performance bottleneck.

Suggested change ∙ Feature Preview

Parallelize the model queries using async/await pattern or concurrent.futures to make multiple requests simultaneously:

async def get_model_responses(client, chat_messages, num_queries): async with asyncio.TaskGroup() as tg: tasks = [ tg.create_task(client.acall(chat_messages)) for _ in range(num_queries) ] return [t.result() for t in tasks]

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-08-27T18:55:06Z

src/agentlab/analyze/agent_xray.py

@@ -858,14 +881,102 @@
    else:
        raise ValueError("Chat messages should be a list of BaseMessage or dict")

-    client = OpenAI()
+    client = AzureChatModel(model_name="gpt-35-turbo", deployment_name="gpt-35-turbo")


Hardcoded LLM Provider Configuration

Tell me more

What is the issue?

The code hardcodes the Azure model configuration without providing flexibility for different model providers or configurations.

Why this matters

Users won't be able to use different LLM providers or models for hint tuning, which limits the tool's applicability and contradicts the developer's intent of providing a systematic approach for evaluation.

Suggested change ∙ Feature Preview

Create a configurable model provider system:

def get_model_client(provider="azure", model_name="gpt-35-turbo"): if provider == "azure": return AzureChatModel(model_name=model_name, deployment_name=model_name) elif provider == "openai": return OpenAIChatModel(model_name=model_name) elif provider == "openrouter": return OpenRouterChatModel(model_name=model_name) else: raise ValueError(f"Unsupported provider: {provider}") # Usage in submit_action: client = get_model_client(provider="azure", model_name="gpt-35-turbo")

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-08-27T18:55:06Z

src/agentlab/analyze/agent_xray.py

+        answers.append(content)
+
+        # Extract action part using regex
+        action_match = re.search(r'<action>(.*?)</action>', content, re.DOTALL)


Fragile Action Extraction Logic

Tell me more

What is the issue?

The code assumes actions are always wrapped in tags, but doesn't handle cases where the model output might be malformed or use different formats.

Why this matters

The action extraction could fail silently when the model output doesn't follow the expected format, leading to incomplete or misleading action distribution analysis.

Suggested change ∙ Feature Preview

Add robust action extraction with error handling:

def extract_action(content: str) -> str | None: # Try XML-style tags action_match = re.search(r'<action>(.*?)</action>', content, re.DOTALL) if action_match: return action_match.group(1).strip() # Try markdown-style formatting action_match = re.search(r'`action: (.*?)`', content, re.DOTALL) if action_match: return action_match.group(1).strip() # Try plain text format action_match = re.search(r'Action: (.*?)(?: |$)', content) if action_match: return action_match.group(1).strip() return None # Usage in loop: action = extract_action(content) if action: actions.append(action) else: print(f"Warning: Could not extract action from response: {content[:100]}...")

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

Added hint tuning tool to evaluate best actions at a given step

7c9dc4c

korbit-ai bot reviewed Aug 27, 2025

View reviewed changes

xiaomingservicenow marked this pull request as draft August 27, 2025 19:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add hint tuning tool to evaluate best actions at a given step #285

Add hint tuning tool to evaluate best actions at a given step #285

Uh oh!

xiaomingservicenow commented Aug 27, 2025 •

edited by korbit-ai bot

Loading

Uh oh!

korbit-ai bot left a comment •

edited

Loading

Uh oh!

korbit-ai bot Aug 27, 2025

Uh oh!

korbit-ai bot Aug 27, 2025

Uh oh!

korbit-ai bot Aug 27, 2025

Uh oh!

Uh oh!

Add hint tuning tool to evaluate best actions at a given step #285

Are you sure you want to change the base?

Add hint tuning tool to evaluate best actions at a given step #285

Uh oh!

Conversation

xiaomingservicenow commented Aug 27, 2025 • edited by korbit-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description by Korbit AI

What change is being made?

Why are these changes being made?

Uh oh!

korbit-ai bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.

Uh oh!

korbit-ai bot Aug 27, 2025

Choose a reason for hiding this comment

Sequential Model Queries

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

korbit-ai bot Aug 27, 2025

Choose a reason for hiding this comment

Hardcoded LLM Provider Configuration

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

korbit-ai bot Aug 27, 2025

Choose a reason for hiding this comment

Fragile Action Extraction Logic

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

Uh oh!

xiaomingservicenow commented Aug 27, 2025 •

edited by korbit-ai bot

Loading

korbit-ai bot left a comment •

edited

Loading