tag:github.com,2008:https://github.com/ctrl-gaurav/BeyondBench/releases Release notes from BeyondBench 2026-03-06T08:40:32Z tag:github.com,2008:Repository/1150363086/v0.1.0 2026-03-06T10:28:05Z BeyondBench v0.1.0 <h1>BeyondBench v0.1.0 - FastAPI Server, CLI Improvements, CI/CD, Comprehensive Tests</h1> <h2>What's New in v0.1.0</h2> <p>BeyondBench v0.1.0 is a major improvement over v0.0.2, bringing production-grade tooling, a REST API server, comprehensive CLI commands, CI/CD pipelines, and 239 tests.</p> <hr> <h2>Highlights</h2> <ul> <li><strong>FastAPI REST API Server</strong> (<code>beyondbench serve</code>) with endpoints for task listing, evaluation, job tracking, and result retrieval</li> <li><strong>New CLI Commands</strong>: <code>beyondbench init</code>, <code>beyondbench info &lt;task&gt;</code>, <code>beyondbench results list/show/compare</code></li> <li><strong>Config File Support</strong>: JSON schema validation, example configs, <code>beyondbench run-config</code></li> <li><strong>GitHub Actions CI/CD</strong>: Automated testing (Python 3.10-3.13), linting, PyPI publishing via OIDC</li> <li><strong>239 Tests</strong>: Comprehensive test suite covering data generation and evaluation for all task suites</li> <li><strong>Python 3.10+</strong>: Minimum Python version raised to 3.10</li> </ul> <hr> <h2>Added</h2> <ul> <li>FastAPI REST API server (<code>beyondbench serve</code>) with 7 endpoints (<code>/health</code>, <code>/tasks</code>, <code>/evaluate</code>, <code>/jobs</code>, <code>/results</code>, etc.)</li> <li><code>beyondbench init</code> command for interactive config file creation</li> <li><code>beyondbench info &lt;task&gt;</code> command for viewing task details with Rich formatting</li> <li><code>beyondbench results list/show/compare</code> commands for results viewer</li> <li>Config file validation with JSON schema (<code>beyondbench/configs/schema.json</code>)</li> <li>Example configs: <code>default.yaml</code>, <code>openai_example.yaml</code>, <code>full_evaluation.yaml</code></li> <li>Comprehensive test suite (239 tests) with real data generation and evaluation</li> <li>PEP 561 <code>py.typed</code> marker for type checking support</li> <li><code>__main__.py</code> for <code>python -m beyondbench</code> support</li> <li>Exponential backoff for API rate limiting</li> <li>GitHub Actions: <code>test.yml</code> (Python 3.10-3.13 matrix), <code>lint.yml</code>, <code>publish.yml</code> (PyPI OIDC)</li> <li>Pre-commit configuration (ruff + hooks)</li> <li><code>CONTRIBUTING.md</code>, <code>SECURITY.md</code></li> <li>Issue templates (bug report, feature request) and PR template</li> <li>Dependabot configuration for automated dependency updates</li> </ul> <h2>Changed</h2> <ul> <li>License changed from MIT to Apache-2.0</li> <li>Minimum Python version raised from 3.8 to 3.10</li> <li>Simplified ModelHandler GPT-5 logic</li> <li>Single version source of truth via <code>importlib.metadata</code></li> <li>Wizard now wires to actual evaluation pipeline</li> <li>Improved error messages with Rich formatting and install suggestions</li> </ul> <h2>Fixed</h2> <ul> <li>Token extraction for all API backends (OpenAI, Gemini, Anthropic)</li> <li>EvaluationEngine seed handling for reproducible evaluations</li> </ul> <h2>Removed</h2> <ul> <li>Redundant <code>setup.py</code> (replaced by <code>pyproject.toml</code>)</li> <li>Dead code: <code>simple_cli.py</code>, <code>PlaceholderTask</code></li> </ul> <hr> <h2>Installation</h2> <div class="highlight highlight-source-shell notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="pip install beyondbench==0.1.0 # With optional dependencies pip install beyondbench[all-apis] # OpenAI, Gemini, Anthropic pip install beyondbench[serve] # FastAPI server pip install beyondbench[full] # Everything"><pre>pip install beyondbench==0.1.0 <span class="pl-c"><span class="pl-c">#</span> With optional dependencies</span> pip install beyondbench[all-apis] <span class="pl-c"><span class="pl-c">#</span> OpenAI, Gemini, Anthropic</span> pip install beyondbench[serve] <span class="pl-c"><span class="pl-c">#</span> FastAPI server</span> pip install beyondbench[full] <span class="pl-c"><span class="pl-c">#</span> Everything</span></pre></div> <h2>Quick Start</h2> <div class="highlight highlight-source-shell notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="# Interactive wizard beyondbench # Evaluate a model beyondbench evaluate --model-id gpt-4o --api-provider openai --suite easy # Start API server beyondbench serve --port 8000 # Create config interactively beyondbench init"><pre><span class="pl-c"><span class="pl-c">#</span> Interactive wizard</span> beyondbench <span class="pl-c"><span class="pl-c">#</span> Evaluate a model</span> beyondbench evaluate --model-id gpt-4o --api-provider openai --suite easy <span class="pl-c"><span class="pl-c">#</span> Start API server</span> beyondbench serve --port 8000 <span class="pl-c"><span class="pl-c">#</span> Create config interactively</span> beyondbench init</pre></div> <hr> <p><strong>Full Changelog</strong>: <a href="https://github.com/ctrl-gaurav/BeyondBench/blob/main/CHANGELOG.md">https://github.com/ctrl-gaurav/BeyondBench/blob/main/CHANGELOG.md</a><br> <strong>Documentation</strong>: <a href="https://github.com/ctrl-gaurav/BeyondBench/blob/main/docs/DOCUMENTATION.md">https://github.com/ctrl-gaurav/BeyondBench/blob/main/docs/DOCUMENTATION.md</a><br> <strong>Paper</strong>: <a href="https://arxiv.org/abs/2509.24210" rel="nofollow">https://arxiv.org/abs/2509.24210</a><br> <strong>Leaderboard</strong>: <a href="https://ctrl-gaurav.github.io/BeyondBench/" rel="nofollow">https://ctrl-gaurav.github.io/BeyondBench/</a></p> ctrl-gaurav