Improve vf-eval display by snimu · Pull Request #809 · PrimeIntellect-ai/verifiers

snimu · 2026-02-01T20:30:09Z

Description

Three updates to the way vf-eval displays progress and results:

The time on the progress bar updates live every second
The Evaluation Summary is moved to the very end of the results section
The reward distribution histogram now uses vertical instead of horizontal bars
Two small other changes

Progress timer

The time shown on the very right of the progress bar now updates every second. The time taken for a rollout is an important metric, and we aim to update metrics as soon as they come in. For long-running rollouts, an increasing timer is very useful.

There is a remaining issue where the time-difference between the timer updates isn't always exactly 1 second apart, but it's pretty good and a big improvement over no timer updates.

Below is an example. Note that the progress bar shows "3s" despite no rollout having finished yet:

Move Evaluation Summary to the end

The Evaluation Summary contains important high-level statistics like examples, rollouts, errors, reward, and time, but it was previously displayed before the example prompt and completion. for long rollouts, this meant that users had to scroll up very far to see the Evaluation Summary, and might miss it entirely.

Moving it to the bottom of the results section doesn't take much space, but makes this important information immediately visible. This is what it looks like now:

Vertical-bar histogram

This is what the histogram in reward distribution looked like before:

And this is what it looks like after this PR:

While it takes up a bit more space now, it's also more immediately recognizable as a histogram.

Other changes

Remove the uv.sources for the prime-tunnel package. The branch of prime that was fixed doesn't exist anymore, causing the build to break, but the prime-tunnel package is now released so we simply don't need a pinned source anymore
Fixed a ty error in browser_env to make the checks pass

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Test improvement

Testing

All existing tests pass when running uv run pytest locally.
New tests have been added to cover the changes

Checklist

My code follows the style guidelines of this project as outlined in AGENTS.md
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published

Note

Low Risk
Primarily display/UX changes plus a small async/thread invocation fix; minimal impact on core evaluation logic, but could affect TUI refresh behavior and shutdown if cancellation handling is wrong.

Overview
Improves vf-eval UX by adding a periodic (1s) refresh loop so progress-bar timers update live during long-running rollouts, and by reordering the final output to show per-environment details/errors/save paths first and the Evaluation Summary table last.

Updates the reward distribution display to a vertical-bar histogram, fixes an asyncio/thread interaction in CUAMode.verify_server_connection, and removes the pinned prime-tunnel git source from pyproject.toml now that the package is released.

^{Written by Cursor Bugbot for commit 0604b2e. This will update automatically on new commits. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-02-01T20:39:22Z

pyproject.toml

 ]

-[tool.uv.sources]
-prime-tunnel = { git = "https://github.com/PrimeIntellect-ai/prime.git", branch = "feature/tunnel", subdirectory = "packages/prime-tunnel" }


Removed git source for prime-tunnel dependency

High Severity

The [tool.uv.sources] section that specified the git source for prime-tunnel was removed, but the package is still listed as a dependency and is actively imported in cli_agent_env.py and rlm_env.py. This change is unrelated to the PR's purpose of improving vf-eval display and will likely cause installation failures if prime-tunnel is not available on PyPI.

This was intentional: the feature/tunnel branch was merged into main and deleted, so verifiers doesn't build anymore with this source in it. But the tunnel was also published now, so we can simply remove the uv.sources and it works.

verifiers/utils/eval_display.py

mikasenghaas

lgtm

snimu added 3 commits January 31, 2026 23:56

Refresh eval display timer and move summary to bottom

cf0e676

Fix CUA health check submit and restore vertical histogram

b2cbf23

remove uv.sources for prime-tunnel

55c8319

cursor bot reviewed Feb 1, 2026

View reviewed changes

Center histogram labels and counts

0604b2e

mikasenghaas approved these changes Feb 1, 2026

View reviewed changes

snimu merged commit dccb7ba into main Feb 2, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Improve vf-eval display#809

Improve vf-eval display#809
snimu merged 4 commits intomainfrom
sebastian/vf-eval-display-2026-01-31

snimu commented Feb 1, 2026 •

edited by cursor bot

Loading

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Feb 1, 2026

Uh oh!

snimu Feb 1, 2026

Uh oh!

Uh oh!

mikasenghaas left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

snimu commented Feb 1, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Progress timer

Move Evaluation Summary to the end

Vertical-bar histogram

Other changes

Type of Change

Testing

Checklist

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Feb 1, 2026

Choose a reason for hiding this comment

Removed git source for prime-tunnel dependency

Uh oh!

snimu Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mikasenghaas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

snimu commented Feb 1, 2026 •

edited by cursor bot

Loading