Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| max_concurrent = math.ceil(max_concurrent / config.rollouts_per_example) | ||
|
|
||
| if max_concurrent > 0 and total_rollouts > 0: | ||
| return min(max_concurrent, total_rollouts) |
There was a problem hiding this comment.
Capping uses wrong unit when independent_scoring is false
Medium Severity
When independent_scoring=False and rollouts_per_example > 1, the function divides max_concurrent by rollouts_per_example to get concurrent example groups, but then caps against total_rollouts (measured in rollouts). This unit mismatch causes incorrect display values. For example, with max_concurrent=100, rollouts_per_example=4, and num_examples=5, the display shows 20 instead of 5. The cap should use total_rollouts // config.rollouts_per_example when scaling has been applied.


Motivation
concurrent rolloutsvalue larger than the actual work or runtime concurrency because it showedconfig.max_concurrentdirectly instead of the effective concurrency used at runtime and did not cap by total rollouts.evaluate(...)(respectingrollouts_per_exampleandindependent_scoring) and avoid inflated labels.Description
EvalDisplay._display_max_concurrent(config, total_rollouts)that computes the effective concurrency by applying rollout-per-example scaling (whenindependent_scoringis False), usingmath.ceilfor division, and capping the result tototal_rolloutswhen appropriate._make_env_panelto render theconcurrent rolloutslabel instead of showingconfig.max_concurrentdirectly, and importmathfor the calculation.tests/test_eval_display.pythat assert capping tototal_rollouts, correct scaling byrollouts_per_example, and thatindependent_scoring=Truepreserves the unscaled value.Testing
uv run pytest tests/test_eval_display.pyand the tests passed (3 passed).uv run ruff check verifiers/utils/eval_display.py tests/test_eval_display.pyand style checks passed.Codex Task
Note
Low Risk
UI-only display logic change with small, well-covered arithmetic adjustments; no runtime evaluation behavior is modified.
Overview
Fixes the eval UI’s
concurrent rolloutslabel to reflect effective runtime concurrency rather than rawconfig.max_concurrent.Adds
EvalDisplay._display_max_concurrent()to scale concurrency byrollouts_per_examplewhenindependent_scoringis disabled (usingceil) and to cap the displayed value tototal_rollouts, and wires this value into the panel rendering. Includes unit tests covering capping, scaling, and theindependent_scoring=Trueexception.Written by Cursor Bugbot for commit c31ec0d. This will update automatically on new commits. Configure here.