Skip to content

tool-quality timing out on larger tools #5

@therealnb

Description

@therealnb

Config

# Frontier model configuration
export DEFAULT_MODEL__PROVIDER=openrouter
export DEFAULT_MODEL__NAME=anthropic/claude-3.5-sonnet
export DEFAULT_MODEL__TIMEOUT=30
export DEFAULT_MODEL__MAX_RETRIES=3

# API key (required for OpenRouter)
export OPENROUTER_API_KEY=sk-or-...
export TEF_API_KEY=sk-or-...

Github has 26 tools, so it times out with 120 seconds. The default is 60 seconds.

mtef tool-quality   --server-urls http://localhost:8080/github/mcp   --model-provider openrouter   --model-name anthropic/claude-3.5-sonnet   --url https://localhost:8000   --insecure   --verbose   --timeout 120
ℹ Using mcp-tef at https://localhost:8000
✗ Request timed out
  The LLM evaluation may take longer for servers with many tools.
  Try increasing the timeout with --timeout (e.g., --timeout 120)

Optimizer works fine.

(mcp-tef) nigels-MacBook-Pro:mcp-tef nigel$ mtef tool-quality   --server-urls http://localhost:8080/mcp-optimizer/mcp   --model-provider openrouter
   --model-name anthropic/claude-3.5-sonnet   --url https://localhost:8000   --insecure   --verbose   --timeout 120
ℹ Using mcp-tef at https://localhost:8000

Tool Quality Evaluation Results
============================================================

Tool: find_tool
  Description: "
Find and return tools from RUNNING servers that can help accomplish the user's request.

This searches only currently running MCP servers. If no relevant tools are found,
use search_registry() to discover tools from servers available in the registry.

Use this function when you need to:
- Discover what tools are available for a specific task
- Find the right tool(s) before attempting to solve a problem
- Check if required functionality exists in the current environment

Args:
    tool_description: Description of the task or capability needed
               (e.g., "web search", "analyze CSV file", "send an email")
    tool_keywords: Space-separated keywords of the task or capability needed.
        These will be used for BM25 text search on available tools.
        (e.g. "list issues github", "SQL query postgres", "Grafana requests slow").

Returns:
    dict: A dictionary containing:
        - tools: List of available tools matching the query, including:
            * Tool names and descriptions
            * Server names (in the mcp_server_name field)
            * Required parameters and schemas
            * Usage examples where applicable
        - token_metrics: Token efficiency metrics showing:
            * baseline_tokens: Total tokens for all running server tools
            * returned_tokens: Total tokens for returned/filtered tools
            * tokens_saved: Number of tokens saved by filtering
            * savings_percentage: Percentage of tokens saved (0-100)

Example:
1) User query: "Find good restaurants in San Jose, California"
This query requires web search. Call find_tool with tool_description="search the web".

2) User query: "Get details of an issue in stacklok/toolhive github repository"
This query requires fetching issue details from github. Call find_tool with
tool_description="Get issue details from GitHub".
"
  Clarity:      9/10 - The description clearly explains what the tool does (find tools from running servers), when to use it (for discovering 
available tools and capabilities), and how to interpret the output (details the returned dictionary structure with tools and token metrics).
  Completeness: 10/10 - The description is highly complete, covering all aspects: purpose, usage scenarios, detailed parameter explanations, return 
value structure, and multiple practical usage examples. The input schema clearly defines required parameters.
  Conciseness:  8/10 - The description is mostly concise but could be slightly more compact. The examples section, while valuable, could be 
condensed without losing important information.
  Suggested:    "Find and return tools from RUNNING servers that match your requested capabilities. Searches only currently running MCP servers (use
search_registry() for offline servers).

Args:
    tool_description: Task description (e.g., "web search", "analyze CSV file")
    tool_keywords: Space-separated search keywords (e.g., "github issues", "postgres query")

Returns:
    - List of matching tools with names, descriptions, server names, parameters, and examples
    - Token efficiency metrics showing baseline, returned, and saved token counts

Use this to discover available tools for specific tasks or verify capabilities before solving problems. Falls back to search_registry() if no 
matches found."

Tool: call_tool
  Description: "
Execute a specific tool with the provided parameters.

Use this function to:
- Run a tool after identifying it with find_tool()
- Execute operations that require specific MCP server functionality
- Perform actions that go beyond your built-in capabilities

Args:
    server_name: The name of the MCP server that provides the tool
                (obtain this from find_tool() results - it's the mcp_server_name field)
    tool_name: The name of the tool to execute
              (obtain this from find_tool() results - it's the tool's name field)
    parameters: Dictionary of arguments required by the tool
               (structure must match the tool's schema from find_tool())

Returns:
    CallToolResult: The output from the tool execution, which may include:
                   - Success/failure status
                   - Result data or content
                   - Error messages if execution failed

Important: Always use find_tool() first to get the correct server_name and tool_name
          and parameter schema before calling this function.
"
  Clarity:      9/10 - The description clearly explains the tool's purpose, when to use it, and its role in executing other tools. The input 
requirements and return values are well explained with clear structure
  Completeness: 8/10 - The description covers key aspects including purpose, usage, parameters, and returns. It includes important usage notes about
using find_tool() first. However, it could benefit from a concrete usage example
  Conciseness:  9/10 - The description is well-organized and concise, with no unnecessary information. Each section serves a clear purpose in 
explaining the tool's functionality
  Suggested:    "Execute a specific tool with the provided parameters.

Use this function to run tools identified through find_tool() that require specific MCP server functionality or go beyond built-in capabilities.

Required Parameters:
- server_name: MCP server name (from find_tool() mcp_server_name field)
- tool_name: Name of tool to execute (from find_tool() name field)
- parameters: Tool arguments matching schema from find_tool()

Returns CallToolResult containing:
- Success/failure status
- Result data/content
- Error messages (if failed)

Example:
result = call_tool(
    server_name="server1",
    tool_name="process_data",
    parameters={"input": "data.txt"}
)

Note: Always use find_tool() first to get correct server name, tool name, and parameter schema."

Tool: list_tools
  Description: "
List all available tools across all MCP servers.

Use this function when you need to:
- See all tools available in the current environment
- Browse the complete catalog of available tools
- Get an overview of all capabilities without filtering

Returns:
    ListToolsResult: All available tools, including:
                    - Tool names and descriptions
                    - Server names (in the mcp_server_name field)
                    - Required parameters and schemas
                    - Usage examples where applicable
"
  Clarity:      9/10 - The description clearly states what the tool does (list all tools), when to use it (for browsing catalog, getting overview), 
and what to expect in the output (tool names, descriptions, server names, parameters, schemas, examples).
  Completeness: 8/10 - The description covers main functionality, use cases, and output structure well. The input schema is appropriately empty 
since no parameters are needed. However, it could benefit from a simple example of the returned data structure.
  Conciseness:  10/10 - The description is very concise while covering all essential information. It uses bullet points effectively and has no 
unnecessary information.
  Suggested:    "List all available tools across all MCP servers.

Use this function when you need to:
- See all tools available in the current environment
- Browse the complete catalog of available tools
- Get an overview of all capabilities without filtering

Returns a ListToolsResult object containing:
- Tool names and descriptions
- Server names (mcp_server_name)
- Required parameters and schemas
- Usage examples where applicable

Example return structure:
{
  "tools": [
    {
      "name": "tool_name",
      "description": "Tool description",
      "mcp_server_name": "server_name",
      "parameters": {...},
      "examples": [...]
    },
    ...
  ]
}"

✓ Evaluated 3 tool(s)
(mcp-tef) nigels-MacBook-Pro:mcp-tef nigel$ 

but is three tools enough?

Should we extend the default timeout to much more?
Do we want to use a session id so this can be asynchronous?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions