BeyondBench v0.1.0

2026-03-06T10:28:05Z

BeyondBench v0.1.0 - FastAPI Server, CLI Improvements, CI/CD, Comprehensive Tests

What's New in v0.1.0

BeyondBench v0.1.0 is a major improvement over v0.0.2, bringing production-grade tooling, a REST API server, comprehensive CLI commands, CI/CD pipelines, and 239 tests.

Highlights

FastAPI REST API Server (beyondbench serve) with endpoints for task listing, evaluation, job tracking, and result retrieval
New CLI Commands: beyondbench init, beyondbench info <task>, beyondbench results list/show/compare
Config File Support: JSON schema validation, example configs, beyondbench run-config
GitHub Actions CI/CD: Automated testing (Python 3.10-3.13), linting, PyPI publishing via OIDC
239 Tests: Comprehensive test suite covering data generation and evaluation for all task suites
Python 3.10+: Minimum Python version raised to 3.10

Added

FastAPI REST API server (beyondbench serve) with 7 endpoints (/health, /tasks, /evaluate, /jobs, /results, etc.)
beyondbench init command for interactive config file creation
beyondbench info <task> command for viewing task details with Rich formatting
beyondbench results list/show/compare commands for results viewer
Config file validation with JSON schema (beyondbench/configs/schema.json)
Example configs: default.yaml, openai_example.yaml, full_evaluation.yaml
Comprehensive test suite (239 tests) with real data generation and evaluation
PEP 561 py.typed marker for type checking support
__main__.py for python -m beyondbench support
Exponential backoff for API rate limiting
GitHub Actions: test.yml (Python 3.10-3.13 matrix), lint.yml, publish.yml (PyPI OIDC)
Pre-commit configuration (ruff + hooks)
CONTRIBUTING.md, SECURITY.md
Issue templates (bug report, feature request) and PR template
Dependabot configuration for automated dependency updates

Changed

License changed from MIT to Apache-2.0
Minimum Python version raised from 3.8 to 3.10
Simplified ModelHandler GPT-5 logic
Single version source of truth via importlib.metadata
Wizard now wires to actual evaluation pipeline
Improved error messages with Rich formatting and install suggestions

Fixed

Token extraction for all API backends (OpenAI, Gemini, Anthropic)
EvaluationEngine seed handling for reproducible evaluations

Removed

Redundant setup.py (replaced by pyproject.toml)
Dead code: simple_cli.py, PlaceholderTask

Installation

pip install beyondbench==0.1.0

# With optional dependencies
pip install beyondbench[all-apis]    # OpenAI, Gemini, Anthropic
pip install beyondbench[serve]       # FastAPI server
pip install beyondbench[full]        # Everything

Quick Start

# Interactive wizard
beyondbench

# Evaluate a model
beyondbench evaluate --model-id gpt-4o --api-provider openai --suite easy

# Start API server
beyondbench serve --port 8000

# Create config interactively
beyondbench init

Full Changelog: https://github.com/ctrl-gaurav/BeyondBench/blob/main/CHANGELOG.md
Documentation: https://github.com/ctrl-gaurav/BeyondBench/blob/main/docs/DOCUMENTATION.md
Paper: https://arxiv.org/abs/2509.24210
Leaderboard: https://ctrl-gaurav.github.io/BeyondBench/

Release notes from BeyondBench

BeyondBench v0.1.0

BeyondBench v0.1.0 - FastAPI Server, CLI Improvements, CI/CD, Comprehensive Tests

What's New in v0.1.0

Highlights

Added

Changed

Fixed

Removed

Installation

Quick Start