track token usage in eval by willccbb · Pull Request #816 · PrimeIntellect-ai/verifiers

willccbb · 2026-02-03T06:37:21Z

Description

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Test improvement

Testing

All existing tests pass when running uv run pytest locally.
New tests have been added to cover the changes

Checklist

My code follows the style guidelines of this project as outlined in AGENTS.md
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published

Additional Notes

Note

Medium Risk
Touches core eval progress/saving paths and changes concurrency behavior; mistakes could skew reported usage/metrics or alter throughput, but the changes are additive and largely display/metadata-focused.

Overview
Evaluation runs now capture token usage by summing response.usage across trajectory steps into per-rollout token_usage, and computing average usage (input/output tokens) in GenerateMetadata when saving results.

Live and post-run displays are updated to show token usage: the Rich eval display tracks rolling average tokens and conditionally adds input/output columns to the final summary, and the results-view TUI shows avg input/output tokens in the metadata panel. Numeric formatting is centralized via a new format_numeric helper, and concurrency handling is tweaked to scale max_concurrent when group scoring is used (non-independent_scoring with multiple rollouts).

^{Written by Cursor Bugbot for commit 6446b78. This will update automatically on new commits. Configure here.}

verifiers/utils/eval_utils.py

verifiers/scripts/tui.py

verifiers/utils/save_utils.py

verifiers/utils/eval_utils.py

verifiers/utils/eval_display.py

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

verifiers/utils/display_utils.py

samsja

lgtm

track token usage in eval

0b374d0

cursor bot reviewed Feb 3, 2026

View reviewed changes

verifiers/utils/eval_utils.py Show resolved Hide resolved

verifiers/scripts/tui.py Outdated Show resolved Hide resolved

bug fixes

8a1fa5c

cursor bot reviewed Feb 4, 2026

View reviewed changes

verifiers/utils/save_utils.py Show resolved Hide resolved

verifiers/utils/eval_utils.py Outdated Show resolved Hide resolved

bugbot fix

6cb74b0

cursor bot reviewed Feb 4, 2026

View reviewed changes

verifiers/utils/eval_display.py Outdated Show resolved Hide resolved

bugbot

e3a7936

cursor bot reviewed Feb 4, 2026

View reviewed changes

verifiers/utils/display_utils.py Show resolved Hide resolved

remove check for pinference

6446b78

samsja approved these changes Feb 4, 2026

View reviewed changes

willccbb merged commit a1f4839 into main Feb 4, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

track token usage in eval#816

track token usage in eval#816
willccbb merged 5 commits intomainfrom
will/token-usage

willccbb commented Feb 3, 2026 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

samsja left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

willccbb commented Feb 3, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Testing

Checklist

Additional Notes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

samsja left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

willccbb commented Feb 3, 2026 •

edited by cursor bot

Loading