tag:github.com,2008:https://github.com/ctrl-gaurav/BeyondBench/releases
Release notes from BeyondBench
2026-03-06T08:40:32Z
tag:github.com,2008:Repository/1150363086/v0.1.0
2026-03-06T10:28:05Z
BeyondBench v0.1.0
<h1>BeyondBench v0.1.0 - FastAPI Server, CLI Improvements, CI/CD, Comprehensive Tests</h1>
<h2>What's New in v0.1.0</h2>
<p>BeyondBench v0.1.0 is a major improvement over v0.0.2, bringing production-grade tooling, a REST API server, comprehensive CLI commands, CI/CD pipelines, and 239 tests.</p>
<hr>
<h2>Highlights</h2>
<ul>
<li><strong>FastAPI REST API Server</strong> (<code>beyondbench serve</code>) with endpoints for task listing, evaluation, job tracking, and result retrieval</li>
<li><strong>New CLI Commands</strong>: <code>beyondbench init</code>, <code>beyondbench info <task></code>, <code>beyondbench results list/show/compare</code></li>
<li><strong>Config File Support</strong>: JSON schema validation, example configs, <code>beyondbench run-config</code></li>
<li><strong>GitHub Actions CI/CD</strong>: Automated testing (Python 3.10-3.13), linting, PyPI publishing via OIDC</li>
<li><strong>239 Tests</strong>: Comprehensive test suite covering data generation and evaluation for all task suites</li>
<li><strong>Python 3.10+</strong>: Minimum Python version raised to 3.10</li>
</ul>
<hr>
<h2>Added</h2>
<ul>
<li>FastAPI REST API server (<code>beyondbench serve</code>) with 7 endpoints (<code>/health</code>, <code>/tasks</code>, <code>/evaluate</code>, <code>/jobs</code>, <code>/results</code>, etc.)</li>
<li><code>beyondbench init</code> command for interactive config file creation</li>
<li><code>beyondbench info <task></code> command for viewing task details with Rich formatting</li>
<li><code>beyondbench results list/show/compare</code> commands for results viewer</li>
<li>Config file validation with JSON schema (<code>beyondbench/configs/schema.json</code>)</li>
<li>Example configs: <code>default.yaml</code>, <code>openai_example.yaml</code>, <code>full_evaluation.yaml</code></li>
<li>Comprehensive test suite (239 tests) with real data generation and evaluation</li>
<li>PEP 561 <code>py.typed</code> marker for type checking support</li>
<li><code>__main__.py</code> for <code>python -m beyondbench</code> support</li>
<li>Exponential backoff for API rate limiting</li>
<li>GitHub Actions: <code>test.yml</code> (Python 3.10-3.13 matrix), <code>lint.yml</code>, <code>publish.yml</code> (PyPI OIDC)</li>
<li>Pre-commit configuration (ruff + hooks)</li>
<li><code>CONTRIBUTING.md</code>, <code>SECURITY.md</code></li>
<li>Issue templates (bug report, feature request) and PR template</li>
<li>Dependabot configuration for automated dependency updates</li>
</ul>
<h2>Changed</h2>
<ul>
<li>License changed from MIT to Apache-2.0</li>
<li>Minimum Python version raised from 3.8 to 3.10</li>
<li>Simplified ModelHandler GPT-5 logic</li>
<li>Single version source of truth via <code>importlib.metadata</code></li>
<li>Wizard now wires to actual evaluation pipeline</li>
<li>Improved error messages with Rich formatting and install suggestions</li>
</ul>
<h2>Fixed</h2>
<ul>
<li>Token extraction for all API backends (OpenAI, Gemini, Anthropic)</li>
<li>EvaluationEngine seed handling for reproducible evaluations</li>
</ul>
<h2>Removed</h2>
<ul>
<li>Redundant <code>setup.py</code> (replaced by <code>pyproject.toml</code>)</li>
<li>Dead code: <code>simple_cli.py</code>, <code>PlaceholderTask</code></li>
</ul>
<hr>
<h2>Installation</h2>
<div class="highlight highlight-source-shell notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="pip install beyondbench==0.1.0
# With optional dependencies
pip install beyondbench[all-apis] # OpenAI, Gemini, Anthropic
pip install beyondbench[serve] # FastAPI server
pip install beyondbench[full] # Everything"><pre>pip install beyondbench==0.1.0
<span class="pl-c"><span class="pl-c">#</span> With optional dependencies</span>
pip install beyondbench[all-apis] <span class="pl-c"><span class="pl-c">#</span> OpenAI, Gemini, Anthropic</span>
pip install beyondbench[serve] <span class="pl-c"><span class="pl-c">#</span> FastAPI server</span>
pip install beyondbench[full] <span class="pl-c"><span class="pl-c">#</span> Everything</span></pre></div>
<h2>Quick Start</h2>
<div class="highlight highlight-source-shell notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="# Interactive wizard
beyondbench
# Evaluate a model
beyondbench evaluate --model-id gpt-4o --api-provider openai --suite easy
# Start API server
beyondbench serve --port 8000
# Create config interactively
beyondbench init"><pre><span class="pl-c"><span class="pl-c">#</span> Interactive wizard</span>
beyondbench
<span class="pl-c"><span class="pl-c">#</span> Evaluate a model</span>
beyondbench evaluate --model-id gpt-4o --api-provider openai --suite easy
<span class="pl-c"><span class="pl-c">#</span> Start API server</span>
beyondbench serve --port 8000
<span class="pl-c"><span class="pl-c">#</span> Create config interactively</span>
beyondbench init</pre></div>
<hr>
<p><strong>Full Changelog</strong>: <a href="https://github.com/ctrl-gaurav/BeyondBench/blob/main/CHANGELOG.md">https://github.com/ctrl-gaurav/BeyondBench/blob/main/CHANGELOG.md</a><br>
<strong>Documentation</strong>: <a href="https://github.com/ctrl-gaurav/BeyondBench/blob/main/docs/DOCUMENTATION.md">https://github.com/ctrl-gaurav/BeyondBench/blob/main/docs/DOCUMENTATION.md</a><br>
<strong>Paper</strong>: <a href="https://arxiv.org/abs/2509.24210" rel="nofollow">https://arxiv.org/abs/2509.24210</a><br>
<strong>Leaderboard</strong>: <a href="https://ctrl-gaurav.github.io/BeyondBench/" rel="nofollow">https://ctrl-gaurav.github.io/BeyondBench/</a></p>
ctrl-gaurav