Skip to content

Comments

feat: display running average of metrics during vf-eval#443

Closed
anakin87 wants to merge 8 commits intoPrimeIntellect-ai:mainfrom
anakin87:vf-eval-metrics
Closed

feat: display running average of metrics during vf-eval#443
anakin87 wants to merge 8 commits intoPrimeIntellect-ai:mainfrom
anakin87:vf-eval-metrics

Conversation

@anakin87
Copy link
Contributor

@anakin87 anakin87 commented Oct 9, 2025

Description

Fixes #416.

When running vf-eval, the average reward and completion length are now displayed in the progress bar as rollouts complete.
To do this, I added a tqdm_gather_with_metrics utility function that aggregates metrics during async task execution. This function is used in a_generate to show running averages.

(@mikasenghaas take a look)

Example output

vfscreen.mov

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvement

Testing

  • All existing tests pass when running uv run pytest locally.
  • New tests have been added to cover the changes

Checklist

  • My code follows the style guidelines of this project as outlined in AGENTS.md
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Additional Notes

@anakin87 anakin87 changed the title Vf eval metrics feat: display running average of metrics during vf-eval Oct 9, 2025
@anakin87 anakin87 marked this pull request as ready for review October 9, 2025 12:44
@anakin87
Copy link
Contributor Author

@willccbb I'd appreciate your feedback

@anakin87
Copy link
Contributor Author

Closing. Implemented in #693.

@anakin87 anakin87 closed this Jan 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Display running average of metrics during rollout generation + scoring

1 participant