Releases: ctrl-gaurav/BeyondBench
Releases · ctrl-gaurav/BeyondBench
BeyondBench v0.1.0
BeyondBench v0.1.0 - FastAPI Server, CLI Improvements, CI/CD, Comprehensive Tests
What's New in v0.1.0
BeyondBench v0.1.0 is a major improvement over v0.0.2, bringing production-grade tooling, a REST API server, comprehensive CLI commands, CI/CD pipelines, and 239 tests.
Highlights
- FastAPI REST API Server (
beyondbench serve) with endpoints for task listing, evaluation, job tracking, and result retrieval - New CLI Commands:
beyondbench init,beyondbench info <task>,beyondbench results list/show/compare - Config File Support: JSON schema validation, example configs,
beyondbench run-config - GitHub Actions CI/CD: Automated testing (Python 3.10-3.13), linting, PyPI publishing via OIDC
- 239 Tests: Comprehensive test suite covering data generation and evaluation for all task suites
- Python 3.10+: Minimum Python version raised to 3.10
Added
- FastAPI REST API server (
beyondbench serve) with 7 endpoints (/health,/tasks,/evaluate,/jobs,/results, etc.) beyondbench initcommand for interactive config file creationbeyondbench info <task>command for viewing task details with Rich formattingbeyondbench results list/show/comparecommands for results viewer- Config file validation with JSON schema (
beyondbench/configs/schema.json) - Example configs:
default.yaml,openai_example.yaml,full_evaluation.yaml - Comprehensive test suite (239 tests) with real data generation and evaluation
- PEP 561
py.typedmarker for type checking support __main__.pyforpython -m beyondbenchsupport- Exponential backoff for API rate limiting
- GitHub Actions:
test.yml(Python 3.10-3.13 matrix),lint.yml,publish.yml(PyPI OIDC) - Pre-commit configuration (ruff + hooks)
CONTRIBUTING.md,SECURITY.md- Issue templates (bug report, feature request) and PR template
- Dependabot configuration for automated dependency updates
Changed
- License changed from MIT to Apache-2.0
- Minimum Python version raised from 3.8 to 3.10
- Simplified ModelHandler GPT-5 logic
- Single version source of truth via
importlib.metadata - Wizard now wires to actual evaluation pipeline
- Improved error messages with Rich formatting and install suggestions
Fixed
- Token extraction for all API backends (OpenAI, Gemini, Anthropic)
- EvaluationEngine seed handling for reproducible evaluations
Removed
- Redundant
setup.py(replaced bypyproject.toml) - Dead code:
simple_cli.py,PlaceholderTask
Installation
pip install beyondbench==0.1.0
# With optional dependencies
pip install beyondbench[all-apis] # OpenAI, Gemini, Anthropic
pip install beyondbench[serve] # FastAPI server
pip install beyondbench[full] # EverythingQuick Start
# Interactive wizard
beyondbench
# Evaluate a model
beyondbench evaluate --model-id gpt-4o --api-provider openai --suite easy
# Start API server
beyondbench serve --port 8000
# Create config interactively
beyondbench initFull Changelog: https://github.com/ctrl-gaurav/BeyondBench/blob/main/CHANGELOG.md
Documentation: https://github.com/ctrl-gaurav/BeyondBench/blob/main/docs/DOCUMENTATION.md
Paper: https://arxiv.org/abs/2509.24210
Leaderboard: https://ctrl-gaurav.github.io/BeyondBench/