Add TauBench Verified benchmark integration #4
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Add a new TauBench Verified benchmark suite by mirroring the existing TauBench integration and wiring it through registry, config, and docs metadata.
What are you adding?
Changes Made
tau_bench_verified_{retail,airline,telecom}evals plus dataset/solver/scorer wrappers following the existing TauBench file pattern.TAU2_DATA_DIRswitching safely.Testing
pytest)pre-commit run --all-files)Checklist
Related Issues
N/A
Additional Context
Verified TauBench data can be overridden via
OPENBENCH_TAU2_VERIFIED_DATA_DIRwhen needed.