Problem
The logs tool provides aggregate metrics per episode: total_tokens,
total_estimated_cost, mcp_failure_count. There is no breakdown by individual
tool call.
This means consumers cannot answer:
- "Which MCP tool consumes the most tokens?"
- "Which tool call failed and why?"
- "What is the latency distribution per tool?"
Current behavior
Only aggregates: total_tokens: 12840, mcp_failure_count: 1
Expected behavior
Include a tool_calls array per episode:
{
"tool_calls": [
{
"tool": "get_file_contents",
"server": "github",
"tokens": 2400,
"duration_ms": 350,
"status": "success"
},
{
"tool": "search_code",
"server": "github",
"tokens": 5200,
"duration_ms": 1200,
"status": "success"
},
{
"tool": "create_pull_request",
"server": "github",
"tokens": 800,
"duration_ms": 600,
"status": "error",
"error": "403 Resource not accessible by integration"
}
]
}
Problem
The
logstool provides aggregate metrics per episode: total_tokens,total_estimated_cost, mcp_failure_count. There is no breakdown by individual
tool call.
This means consumers cannot answer:
Current behavior
Only aggregates:
total_tokens: 12840,mcp_failure_count: 1Expected behavior
Include a
tool_callsarray per episode:{ "tool_calls": [ { "tool": "get_file_contents", "server": "github", "tokens": 2400, "duration_ms": 350, "status": "success" }, { "tool": "search_code", "server": "github", "tokens": 5200, "duration_ms": 1200, "status": "success" }, { "tool": "create_pull_request", "server": "github", "tokens": 800, "duration_ms": 600, "status": "error", "error": "403 Resource not accessible by integration" } ] }