Skip to content

Comments

Improve vf-eval display#809

Merged
snimu merged 4 commits intomainfrom
sebastian/vf-eval-display-2026-01-31
Feb 2, 2026
Merged

Improve vf-eval display#809
snimu merged 4 commits intomainfrom
sebastian/vf-eval-display-2026-01-31

Conversation

@snimu
Copy link
Contributor

@snimu snimu commented Feb 1, 2026

Description

Three updates to the way vf-eval displays progress and results:

  • The time on the progress bar updates live every second
  • The Evaluation Summary is moved to the very end of the results section
  • The reward distribution histogram now uses vertical instead of horizontal bars
  • Two small other changes

Progress timer

The time shown on the very right of the progress bar now updates every second. The time taken for a rollout is an important metric, and we aim to update metrics as soon as they come in. For long-running rollouts, an increasing timer is very useful.

There is a remaining issue where the time-difference between the timer updates isn't always exactly 1 second apart, but it's pretty good and a big improvement over no timer updates.

Below is an example. Note that the progress bar shows "3s" despite no rollout having finished yet:

image

Move Evaluation Summary to the end

The Evaluation Summary contains important high-level statistics like examples, rollouts, errors, reward, and time, but it was previously displayed before the example prompt and completion. for long rollouts, this meant that users had to scroll up very far to see the Evaluation Summary, and might miss it entirely.

Moving it to the bottom of the results section doesn't take much space, but makes this important information immediately visible. This is what it looks like now:

image

Vertical-bar histogram

This is what the histogram in reward distribution looked like before:

image

And this is what it looks like after this PR:

image

While it takes up a bit more space now, it's also more immediately recognizable as a histogram.

Other changes

  • Remove the uv.sources for the prime-tunnel package. The branch of prime that was fixed doesn't exist anymore, causing the build to break, but the prime-tunnel package is now released so we simply don't need a pinned source anymore
  • Fixed a ty error in browser_env to make the checks pass

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvement

Testing

  • All existing tests pass when running uv run pytest locally.
  • New tests have been added to cover the changes

Checklist

  • My code follows the style guidelines of this project as outlined in AGENTS.md
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Note

Low Risk
Primarily display/UX changes plus a small async/thread invocation fix; minimal impact on core evaluation logic, but could affect TUI refresh behavior and shutdown if cancellation handling is wrong.

Overview
Improves vf-eval UX by adding a periodic (1s) refresh loop so progress-bar timers update live during long-running rollouts, and by reordering the final output to show per-environment details/errors/save paths first and the Evaluation Summary table last.

Updates the reward distribution display to a vertical-bar histogram, fixes an asyncio/thread interaction in CUAMode.verify_server_connection, and removes the pinned prime-tunnel git source from pyproject.toml now that the package is released.

Written by Cursor Bugbot for commit 0604b2e. This will update automatically on new commits. Configure here.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

]

[tool.uv.sources]
prime-tunnel = { git = "https://github.com/PrimeIntellect-ai/prime.git", branch = "feature/tunnel", subdirectory = "packages/prime-tunnel" }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed git source for prime-tunnel dependency

High Severity

The [tool.uv.sources] section that specified the git source for prime-tunnel was removed, but the package is still listed as a dependency and is actively imported in cli_agent_env.py and rlm_env.py. This change is unrelated to the PR's purpose of improving vf-eval display and will likely cause installation failures if prime-tunnel is not available on PyPI.

Fix in Cursor Fix in Web

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was intentional: the feature/tunnel branch was merged into main and deleted, so verifiers doesn't build anymore with this source in it. But the tunnel was also published now, so we can simply remove the uv.sources and it works.

Copy link
Member

@mikasenghaas mikasenghaas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@snimu snimu merged commit dccb7ba into main Feb 2, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants