Skip to content

Conversation

@edgarpavlovsky
Copy link
Member

Summary

Fixes the terminal-bench integration test that was disabled in CI.

Root Cause

The benchmark/pyproject.toml was missing package configuration, causing the Fireteam adapter installation to fail with:

ValueError: Unable to determine which files to ship inside the wheel

Changes

  1. Fixed pyproject.toml - Added [tool.hatch.build.targets.wheel] with packages = ["adapters"]
  2. Enabled CI test - Changed from if: false to run on main and e/* branches
  3. Fixed PATH issues - Added $HOME/.local/bin to PATH for uv and terminal-bench
  4. Added timeouts - 20min job timeout, 15min step timeout to prevent hanging
  5. Better logging - Added timestamps and unbuffered Python output
  6. Local testing setup - Added .actrc for testing with act locally

Testing

  • ✅ Package installs successfully locally
  • ✅ Package installs successfully in act container
  • ✅ Adapter is importable
  • ✅ Test executes (verified with act)

Local Iteration

Developers can now test this workflow locally using:

act -j integration-tests --secret-file .secrets --reuse

This PR will verify the test runs successfully in the actual CI environment.

Note: Branch name starts with e/ to trigger integration tests as configured in the workflow.

- Fix pyproject.toml: Add [tool.hatch.build.targets.wheel] packages specification
- Enable terminal-bench test in CI workflow with proper conditions
- Add PATH fixes for uv and terminal-bench binaries
- Add timeouts to prevent hanging (20min job, 15min step)
- Add .actrc for local GitHub Actions testing with act
- Add .secrets to .gitignore

The terminal-bench adapter package was failing to build because hatchling
didn't know which files to include. Now it correctly includes the adapters/
directory and the test can run in CI.
The adapter was missing the memory module when copying Fireteam code
into the terminal-bench container, causing ModuleNotFoundError when
orchestrator.py tried to import memory.manager.

This wasn't caught in local act testing because we interrupted the test
before it reached actual execution inside the container.
The --livestream flag was causing terminal-bench to hide console output
and stream to tmux instead, making it impossible to see progress or
debug issues in CI/act logs. Removing it allows output to appear
normally in pytest/CI logs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants