Skip to content

Comments

fix: post_rollout IndexError and missing completion (fixes #726)#937

Open
gspeter-max wants to merge 3 commits intoPrimeIntellect-ai:mainfrom
gspeter-max:verifiersContribution/issue_726
Open

fix: post_rollout IndexError and missing completion (fixes #726)#937
gspeter-max wants to merge 3 commits intoPrimeIntellect-ai:mainfrom
gspeter-max:verifiersContribution/issue_726

Conversation

@gspeter-max
Copy link

@gspeter-max gspeter-max commented Feb 19, 2026

Summary

Fixes #726 - post_rollout can crash with IndexError when trajectory is empty and does not have access to completion field.

Root Cause

  1. post_rollout runs as a cleanup handler (via @vf.cleanup) BEFORE state["completion"] is set
  2. When trajectory is empty (e.g., environment errors before any turns), accessing state["trajectory"][-1] raises IndexError
  3. Inconsistent API: rubrics can access completion but cleanup handlers cannot

Changes

1. Code Changes

  • verifiers/envs/sandbox_env.py: Added comprehensive docstring + error handling wrapper in destroy_sandbox
  • verifiers/envs/experimental/cli_agent_env.py: Added error handling wrapper in destroy_sandbox

2. Tests

  • tests/test_post_rollout_safety.py (NEW): 5 comprehensive test cases
    • Empty trajectory handling
    • Missing trajectory key handling
    • Cleanup with empty trajectory
    • Error handling in destroy_sandbox
    • Custom post_rollout with trajectory access
  • tests/test_multiturn_env.py: Added test for early error scenarios

3. Documentation

  • docs/environments.md: Added "Cleanup Handlers and post_rollout" section with:
    • Execution order explanation
    • Available state data
    • Safe trajectory access patterns
    • Practical examples

Test Results

All 25 tests passed ✅
- test_post_rollout_safety.py: 5/5 passed
- test_multiturn_env.py: 18/18 passed  
- test_sandbox_env.py: 2/2 passed

Impact

  • ✅ Rollouts no longer crash when environment errors occur before any trajectory steps
  • ✅ Environment designers have clear guidance on safe post_rollout patterns
  • ✅ Consistent error handling across all cleanup handlers
  • ✅ Comprehensive test coverage prevents regression
  • ✅ Backward compatible - no breaking changes

Checklist

  • Implementation complete
  • Tests added and passing
  • Documentation updated
  • No over-engineering
  • Follows project conventions

Refs: #726


Note

Low Risk
Small, backward-compatible robustness change limited to cleanup paths plus tests/docs; main behavior change is swallowing post_rollout exceptions to ensure sandbox deletion continues.

Overview
Prevents rollout teardown from crashing when an environment errors before any turns by wrapping post_rollout execution in destroy_sandbox (both SandboxEnv and CliAgentEnv) and logging failures while still deleting the sandbox.

Adds regression coverage for empty/missing state["trajectory"] and early-setup failures, plus documentation clarifying that @vf.cleanup handlers (including post_rollout) run before state["completion"] is set and showing safe trajectory access patterns for environment authors.

Written by Cursor Bugbot for commit 6bb1480. This will update automatically on new commits. Configure here.

Root Cause:
- post_rollout runs as a cleanup handler (via @vf.cleanup) BEFORE state["completion"] is set
- When trajectory is empty (e.g., environment errors before any turns), accessing state["trajectory"][-1] raises IndexError
- Environment designers had to add perfect error handling or risk crashes
- Inconsistent API: rubrics can access completion but cleanup handlers cannot

Changes:
1. Updated sandbox_env.py base post_rollout docstring with comprehensive documentation about:
   - When post_rollout runs (cleanup handler, BEFORE completion is set)
   - What state data is available (trajectory may be empty, error, sandbox_id)
   - Safe access pattern examples

2. Added error handling wrapper in sandbox_env.py destroy_sandbox:
   - Wrapped post_rollout call in try-except
   - Logs warning if post_rollout fails (not error, since cleanup should continue)
   - Ensures sandbox cleanup continues even if post_rollout fails

3. Added same error handling pattern to cli_agent_env.py destroy_sandbox:
   - Follows same pattern as sandbox_env.py
   - Maintains consistency across cleanup handlers

4. Added comprehensive tests in tests/test_post_rollout_safety.py:
   - test_post_rollout_with_empty_trajectory
   - test_post_rollout_with_missing_trajectory
   - test_cleanup_with_empty_trajectory
   - test_destroy_sandbox_handles_post_rollout_error
   - test_custom_post_rollout_with_trajectory_access

5. Added test to tests/test_multiturn_env.py:
   - test_multiturn_rollout_with_early_error - Verifies early errors don't crash cleanup

6. Updated docs/environments.md with new section:
   - "Cleanup Handlers and post_rollout" explaining execution order
   - Available state data in cleanup handlers
   - Safe trajectory access patterns
   - Examples of proper post_rollout implementation

Impact:
- Rollouts no longer crash when environment errors occur before any trajectory steps
- Environment designers have clear guidance on safe post_rollout patterns
- Consistent error handling across all cleanup handlers
- Comprehensive test coverage prevents regression

Refs: PrimeIntellect-ai#726
Root Cause:
- ErrorDuringSetupEnv test class was missing implementation of abstract method env_response
- Runtime errors were not being caught because they weren't vf.Error types

Changes:
- Added env_response method implementation to ErrorDuringSetupEnv
- Changed RuntimeError to vf.SandboxError so error is properly caught by framework
- env_response returns empty list since it should never be called due to setup error

Impact:
- All tests now pass (25/25)
- Test properly validates early error handling with empty trajectory

Refs: PrimeIntellect-ai#726
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

…uidance

Root Cause:
- PR added new documentation about cleanup handlers and post_rollout to docs/environments.md
- Project rule requires skills files to be updated when user-facing documentation changes

Changes:
- Added "Resource Management and Cleanup Handlers" section to skills/create-environments/SKILL.md
- Documented critical execution order: cleanup runs BEFORE completion is set
- Added safe post_rollout implementation examples
- Listed available state data in cleanup handlers
- Provided safe trajectory access patterns
- Referenced full documentation in docs/environments.md

Impact:
- Skills file now reflects the new cleanup handler guidance
- Users following the skill will get proper cleanup handler patterns
- Addresses Cursor Bugbot feedback about missing skills update

Refs: PrimeIntellect-ai#726
@gspeter-max
Copy link
Author

Thanks for the feedback @cursor[bot]! I've updated the file to include the new cleanup handler and post_rollout guidance.

The changes include:

  • Critical execution order documentation (cleanup runs BEFORE completion)
  • Safe post_rollout implementation examples
  • Available state data in cleanup handlers
  • Safe trajectory access patterns
  • Reference to full documentation in docs/environments.md

This ensures users following the skill will get proper cleanup handler patterns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

post_rollout can crash + does not have completion rendered

3 participants