Skip to content

fix tests

0d304d8
Select commit
Loading
Failed to load commit list.
Merged

Log eventloop lag during vf-eval #687

fix tests
0d304d8
Select commit
Loading
Failed to load commit list.
Cursor / Cursor Bugbot completed Jan 6, 2026 in 7m 53s

Bugbot Review

Bugbot Analysis Progress (7m 54s elapsed)

✅ Gathered PR context (2s)
✅ Analyzed code changes (1s)
✅ Completed bug detection — 2 potential bugs found (7m 41s)
✅ Validation and filtering completed (0s)
✅ Posted analysis results — 2 bugs reported (10s)
✅ Analysis completed successfully (0s)

Final Result: Bugbot completed review and found 2 potential issues

Request ID: serverGenReqId_7aec3a30-3e7c-4f59-a2aa-38f9724e2297

Details

Per-tool metrics not tracked for subclass-added tools

The ToolMonitorRubric is created in ToolEnv.__init__ before subclasses add their tools. When SandboxEnv or PythonEnv call super().__init__(), self.tools is empty, so ToolMonitorRubric captures an empty tool_names list. Tools like bash and python are added via add_tool() only after the rubric is created, so their per-tool call counts won't be tracked. This is a regression from the old pattern where ToolRubric was explicitly created after the environment was fully initialized in math_python.py.

verifiers/envs/tool_env.py#L77-L78

self.add_rubric(ToolMonitorRubric(tools=self.tools))

verifiers/envs/sandbox_env.py#L149-L150

)
self.add_rubric(SandboxMonitorRubric())

verifiers/envs/python_env.py#L201-L206

)
self.add_rubric(PythonMonitorRubric())
self.add_tool(
self.python, args_to_skip=["sandbox_id", "sandbox_state", "python_state"]
)
self.remove_tool(self.bash) # omit from agent tool list

Fix in Cursor Fix in Web