Skip to content

Comments

track token usage in eval#816

Merged
willccbb merged 5 commits intomainfrom
will/token-usage
Feb 4, 2026
Merged

track token usage in eval#816
willccbb merged 5 commits intomainfrom
will/token-usage

Conversation

@willccbb
Copy link
Member

@willccbb willccbb commented Feb 3, 2026

Description

image

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvement

Testing

  • All existing tests pass when running uv run pytest locally.
  • New tests have been added to cover the changes

Checklist

  • My code follows the style guidelines of this project as outlined in AGENTS.md
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Additional Notes


Note

Medium Risk
Touches core eval progress/saving paths and changes concurrency behavior; mistakes could skew reported usage/metrics or alter throughput, but the changes are additive and largely display/metadata-focused.

Overview
Evaluation runs now capture token usage by summing response.usage across trajectory steps into per-rollout token_usage, and computing average usage (input/output tokens) in GenerateMetadata when saving results.

Live and post-run displays are updated to show token usage: the Rich eval display tracks rolling average tokens and conditionally adds input/output columns to the final summary, and the results-view TUI shows avg input/output tokens in the metadata panel. Numeric formatting is centralized via a new format_numeric helper, and concurrency handling is tweaked to scale max_concurrent when group scoring is used (non-independent_scoring with multiple rollouts).

Written by Cursor Bugbot for commit 6446b78. This will update automatically on new commits. Configure here.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Copy link
Member

@samsja samsja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@willccbb willccbb merged commit a1f4839 into main Feb 4, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants