Fix vf-eval concurrent rollouts label to show effective capped value by willccbb · Pull Request #836 · PrimeIntellect-ai/verifiers

willccbb · 2026-02-06T10:05:07Z

Motivation

The eval UI could display a concurrent rollouts value larger than the actual work or runtime concurrency because it showed config.max_concurrent directly instead of the effective concurrency used at runtime and did not cap by total rollouts.
The intent is to make the display reflect the same effective concurrency calculation used by evaluate(...) (respecting rollouts_per_example and independent_scoring) and avoid inflated labels.

Description

Add a static helper EvalDisplay._display_max_concurrent(config, total_rollouts) that computes the effective concurrency by applying rollout-per-example scaling (when independent_scoring is False), using math.ceil for division, and capping the result to total_rollouts when appropriate.
Use the new helper in _make_env_panel to render the concurrent rollouts label instead of showing config.max_concurrent directly, and import math for the calculation.
Add focused unit tests in tests/test_eval_display.py that assert capping to total_rollouts, correct scaling by rollouts_per_example, and that independent_scoring=True preserves the unscaled value.

Testing

Ran uv run pytest tests/test_eval_display.py and the tests passed (3 passed).
Ran uv run ruff check verifiers/utils/eval_display.py tests/test_eval_display.py and style checks passed.

Codex Task

Note

Low Risk
UI-only display logic change with small, well-covered arithmetic adjustments; no runtime evaluation behavior is modified.

Overview
Fixes the eval UI’s concurrent rollouts label to reflect effective runtime concurrency rather than raw config.max_concurrent.

Adds EvalDisplay._display_max_concurrent() to scale concurrency by rollouts_per_example when independent_scoring is disabled (using ceil) and to cap the displayed value to total_rollouts, and wires this value into the panel rendering. Includes unit tests covering capping, scaling, and the independent_scoring=True exception.

^{Written by Cursor Bugbot for commit c31ec0d. This will update automatically on new commits. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-02-06T10:12:29Z

verifiers/utils/eval_display.py

+            max_concurrent = math.ceil(max_concurrent / config.rollouts_per_example)
+
+        if max_concurrent > 0 and total_rollouts > 0:
+            return min(max_concurrent, total_rollouts)


Capping uses wrong unit when independent_scoring is false

Medium Severity

When independent_scoring=False and rollouts_per_example > 1, the function divides max_concurrent by rollouts_per_example to get concurrent example groups, but then caps against total_rollouts (measured in rollouts). This unit mismatch causes incorrect display values. For example, with max_concurrent=100, rollouts_per_example=4, and num_examples=5, the display shows 20 instead of 5. The cap should use total_rollouts // config.rollouts_per_example when scaling has been applied.

Fix vf-eval concurrent rollout label to use effective cap

c31ec0d

willccbb added the codex label Feb 6, 2026 — with ChatGPT Codex Connector

willccbb merged commit a7442ef into main Feb 6, 2026
5 of 6 checks passed

cursor bot reviewed Feb 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Fix vf-eval concurrent rollouts label to show effective capped value#836

Fix vf-eval concurrent rollouts label to show effective capped value#836
willccbb merged 1 commit intomainfrom
codex/fix-concurrent-rollouts-label-calculation

willccbb commented Feb 6, 2026 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

willccbb commented Feb 6, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Description

Testing

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Feb 6, 2026

Choose a reason for hiding this comment

Capping uses wrong unit when independent_scoring is false

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

willccbb commented Feb 6, 2026 •

edited by cursor bot

Loading