Skip to content

Comments

Improvements to math rubric#657

Merged
mikasenghaas merged 10 commits intomainfrom
fix-math-rubric
Dec 22, 2025
Merged

Improvements to math rubric#657
mikasenghaas merged 10 commits intomainfrom
fix-math-rubric

Conversation

@mikasenghaas
Copy link
Member

@mikasenghaas mikasenghaas commented Dec 22, 2025

Description

Fixes to MathRubric:

  • Run in thread pool with configurable number of workers for higher performance
  • Warning logs on exceptions and timeouts
  • Re-raise CancelledError so that timeouts don't show print warnings twice
  • Suppress wrong math_verify logs (Timeout is disabled as parsing_timeout is None or <= 0, you must provide the logic for timeout interuption yourself to prevent code getting stuck.)

Also, use MathRubric in gsm8k for testing.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvement

Testing

  • All existing tests pass when running uv run pytest locally.
  • New tests have been added to cover the changes

Checklist

  • My code follows the style guidelines of this project as outlined in AGENTS.md
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Additional Notes


Note

Run MathRubric parse/verify in a thread pool with proper timeout/logging, use it in gsm8k, and bump gsm8k to verifiers>=0.1.8.

  • Rubrics (verifiers/rubrics/math_rubric.py):
    • Run parse/verify via ThreadPoolExecutor (configurable max_workers) with run_in_executor helper.
    • Enforce timeouts using asyncio.wait_for; re-raise CancelledError; add warning logs on failures/timeouts; suppress math_verify logs.
    • Clean up executor in __del__.
  • Environment (environments/gsm8k/gsm8k.py):
    • Replace custom parser/rubric with vf.MathRubric; pass rubric.parser to SingleTurnEnv.
  • Tests (tests/test_math_rubric.py):
    • Set max_workers=1 in timeout test.
  • Packaging (environments/gsm8k/pyproject.toml):
    • Bump version to 0.1.3; require verifiers>=0.1.8.

Written by Cursor Bugbot for commit b50d99c. This will update automatically on new commits. Configure here.

@mikasenghaas mikasenghaas marked this pull request as ready for review December 22, 2025 09:53
@mikasenghaas mikasenghaas changed the title Fix math rubric Improvements to math rubric Dec 22, 2025
@mikasenghaas mikasenghaas merged commit e4d8bf0 into main Dec 22, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants