Skip to content

Releases: ctrl-gaurav/BeyondBench

BeyondBench v0.1.0

06 Mar 10:28

Choose a tag to compare

BeyondBench v0.1.0 - FastAPI Server, CLI Improvements, CI/CD, Comprehensive Tests

What's New in v0.1.0

BeyondBench v0.1.0 is a major improvement over v0.0.2, bringing production-grade tooling, a REST API server, comprehensive CLI commands, CI/CD pipelines, and 239 tests.


Highlights

  • FastAPI REST API Server (beyondbench serve) with endpoints for task listing, evaluation, job tracking, and result retrieval
  • New CLI Commands: beyondbench init, beyondbench info <task>, beyondbench results list/show/compare
  • Config File Support: JSON schema validation, example configs, beyondbench run-config
  • GitHub Actions CI/CD: Automated testing (Python 3.10-3.13), linting, PyPI publishing via OIDC
  • 239 Tests: Comprehensive test suite covering data generation and evaluation for all task suites
  • Python 3.10+: Minimum Python version raised to 3.10

Added

  • FastAPI REST API server (beyondbench serve) with 7 endpoints (/health, /tasks, /evaluate, /jobs, /results, etc.)
  • beyondbench init command for interactive config file creation
  • beyondbench info <task> command for viewing task details with Rich formatting
  • beyondbench results list/show/compare commands for results viewer
  • Config file validation with JSON schema (beyondbench/configs/schema.json)
  • Example configs: default.yaml, openai_example.yaml, full_evaluation.yaml
  • Comprehensive test suite (239 tests) with real data generation and evaluation
  • PEP 561 py.typed marker for type checking support
  • __main__.py for python -m beyondbench support
  • Exponential backoff for API rate limiting
  • GitHub Actions: test.yml (Python 3.10-3.13 matrix), lint.yml, publish.yml (PyPI OIDC)
  • Pre-commit configuration (ruff + hooks)
  • CONTRIBUTING.md, SECURITY.md
  • Issue templates (bug report, feature request) and PR template
  • Dependabot configuration for automated dependency updates

Changed

  • License changed from MIT to Apache-2.0
  • Minimum Python version raised from 3.8 to 3.10
  • Simplified ModelHandler GPT-5 logic
  • Single version source of truth via importlib.metadata
  • Wizard now wires to actual evaluation pipeline
  • Improved error messages with Rich formatting and install suggestions

Fixed

  • Token extraction for all API backends (OpenAI, Gemini, Anthropic)
  • EvaluationEngine seed handling for reproducible evaluations

Removed

  • Redundant setup.py (replaced by pyproject.toml)
  • Dead code: simple_cli.py, PlaceholderTask

Installation

pip install beyondbench==0.1.0

# With optional dependencies
pip install beyondbench[all-apis]    # OpenAI, Gemini, Anthropic
pip install beyondbench[serve]       # FastAPI server
pip install beyondbench[full]        # Everything

Quick Start

# Interactive wizard
beyondbench

# Evaluate a model
beyondbench evaluate --model-id gpt-4o --api-provider openai --suite easy

# Start API server
beyondbench serve --port 8000

# Create config interactively
beyondbench init

Full Changelog: https://github.com/ctrl-gaurav/BeyondBench/blob/main/CHANGELOG.md
Documentation: https://github.com/ctrl-gaurav/BeyondBench/blob/main/docs/DOCUMENTATION.md
Paper: https://arxiv.org/abs/2509.24210
Leaderboard: https://ctrl-gaurav.github.io/BeyondBench/