Skip to content

Comments

Fix vf-eval concurrent rollouts label to show effective capped value#836

Merged
willccbb merged 1 commit intomainfrom
codex/fix-concurrent-rollouts-label-calculation
Feb 6, 2026
Merged

Fix vf-eval concurrent rollouts label to show effective capped value#836
willccbb merged 1 commit intomainfrom
codex/fix-concurrent-rollouts-label-calculation

Conversation

@willccbb
Copy link
Member

@willccbb willccbb commented Feb 6, 2026

Motivation

  • The eval UI could display a concurrent rollouts value larger than the actual work or runtime concurrency because it showed config.max_concurrent directly instead of the effective concurrency used at runtime and did not cap by total rollouts.
  • The intent is to make the display reflect the same effective concurrency calculation used by evaluate(...) (respecting rollouts_per_example and independent_scoring) and avoid inflated labels.

Description

  • Add a static helper EvalDisplay._display_max_concurrent(config, total_rollouts) that computes the effective concurrency by applying rollout-per-example scaling (when independent_scoring is False), using math.ceil for division, and capping the result to total_rollouts when appropriate.
  • Use the new helper in _make_env_panel to render the concurrent rollouts label instead of showing config.max_concurrent directly, and import math for the calculation.
  • Add focused unit tests in tests/test_eval_display.py that assert capping to total_rollouts, correct scaling by rollouts_per_example, and that independent_scoring=True preserves the unscaled value.

Testing

  • Ran uv run pytest tests/test_eval_display.py and the tests passed (3 passed).
  • Ran uv run ruff check verifiers/utils/eval_display.py tests/test_eval_display.py and style checks passed.

Codex Task


Note

Low Risk
UI-only display logic change with small, well-covered arithmetic adjustments; no runtime evaluation behavior is modified.

Overview
Fixes the eval UI’s concurrent rollouts label to reflect effective runtime concurrency rather than raw config.max_concurrent.

Adds EvalDisplay._display_max_concurrent() to scale concurrency by rollouts_per_example when independent_scoring is disabled (using ceil) and to cap the displayed value to total_rollouts, and wires this value into the panel rendering. Includes unit tests covering capping, scaling, and the independent_scoring=True exception.

Written by Cursor Bugbot for commit c31ec0d. This will update automatically on new commits. Configure here.

@willccbb willccbb merged commit a7442ef into main Feb 6, 2026
5 of 6 checks passed
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

max_concurrent = math.ceil(max_concurrent / config.rollouts_per_example)

if max_concurrent > 0 and total_rollouts > 0:
return min(max_concurrent, total_rollouts)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capping uses wrong unit when independent_scoring is false

Medium Severity

When independent_scoring=False and rollouts_per_example > 1, the function divides max_concurrent by rollouts_per_example to get concurrent example groups, but then caps against total_rollouts (measured in rollouts). This unit mismatch causes incorrect display values. For example, with max_concurrent=100, rollouts_per_example=4, and num_examples=5, the display shows 20 instead of 5. The cap should use total_rollouts // config.rollouts_per_example when scaling has been applied.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant