Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| ] | ||
|
|
||
| [tool.uv.sources] | ||
| prime-tunnel = { git = "https://github.com/PrimeIntellect-ai/prime.git", branch = "feature/tunnel", subdirectory = "packages/prime-tunnel" } |
There was a problem hiding this comment.
Removed git source for prime-tunnel dependency
High Severity
The [tool.uv.sources] section that specified the git source for prime-tunnel was removed, but the package is still listed as a dependency and is actively imported in cli_agent_env.py and rlm_env.py. This change is unrelated to the PR's purpose of improving vf-eval display and will likely cause installation failures if prime-tunnel is not available on PyPI.
There was a problem hiding this comment.
This was intentional: the feature/tunnel branch was merged into main and deleted, so verifiers doesn't build anymore with this source in it. But the tunnel was also published now, so we can simply remove the uv.sources and it works.


Description
Three updates to the way
vf-evaldisplays progress and results:Progress timer
The time shown on the very right of the progress bar now updates every second. The time taken for a rollout is an important metric, and we aim to update metrics as soon as they come in. For long-running rollouts, an increasing timer is very useful.
There is a remaining issue where the time-difference between the timer updates isn't always exactly 1 second apart, but it's pretty good and a big improvement over no timer updates.
Below is an example. Note that the progress bar shows "3s" despite no rollout having finished yet:
Move Evaluation Summary to the end
The Evaluation Summary contains important high-level statistics like
examples,rollouts,errors,reward, andtime, but it was previously displayed before the example prompt and completion. for long rollouts, this meant that users had to scroll up very far to see the Evaluation Summary, and might miss it entirely.Moving it to the bottom of the results section doesn't take much space, but makes this important information immediately visible. This is what it looks like now:
Vertical-bar histogram
This is what the histogram in reward distribution looked like before:
And this is what it looks like after this PR:
While it takes up a bit more space now, it's also more immediately recognizable as a histogram.
Other changes
prime-tunnelpackage. The branch of prime that was fixed doesn't exist anymore, causing the build to break, but theprime-tunnelpackage is now released so we simply don't need a pinned source anymoreType of Change
Testing
uv run pytestlocally.Checklist
Note
Low Risk
Primarily display/UX changes plus a small async/thread invocation fix; minimal impact on core evaluation logic, but could affect TUI refresh behavior and shutdown if cancellation handling is wrong.
Overview
Improves
vf-evalUX by adding a periodic (1s) refresh loop so progress-bar timers update live during long-running rollouts, and by reordering the final output to show per-environment details/errors/save paths first and the Evaluation Summary table last.Updates the reward distribution display to a vertical-bar histogram, fixes an asyncio/thread interaction in
CUAMode.verify_server_connection, and removes the pinnedprime-tunnelgit source frompyproject.tomlnow that the package is released.Written by Cursor Bugbot for commit 0604b2e. This will update automatically on new commits. Configure here.