feat: add GitHub Actions CI and reusable benchmark workflow#13
Conversation
…ents Phase 4 implementation: - .github/workflows/ci.yml: lint, test, and Docker build checks on PRs - .github/workflows/benchmark.yml: reusable workflow for SDK repos to call from their CI to run benchmarks and post results - .github/actions/benchmark/action.yml: composite action wrapper - lib/github.py: sticky PR comment posting via gh CLI, with per-endpoint overhead tables and regression detection - bench.py: add post-comment CLI command - tests/test_github.py: tests for comment formatting Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add all 6 Go Docker images (3 apps × baseline + instrumented) to the docker-build CI matrix. Verified all 11 containers build successfully with act locally. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Connect bench run CLI command to lib.runner.run_benchmark() - Add .github/workflows/test-benchmark.yml for smoke testing - Update .gitignore Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add fail-fast: false so one matrix failure doesn't cancel all builds - Render requirements-sentry.txt from .tmpl before docker build (strips version pin so instrumented Dockerfiles can build in CI) Verified all 13 jobs pass with act: - lint, test (9/9) - 11 docker-build matrix entries - smoke-test: full benchmark ran end-to-end Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix format_comment crash when per-endpoint metrics are missing (N/A
string passed to :+.2f format specifier)
- Fix benchmark.yml checking for results/report.md instead of results.json
- Fix results file path: runner writes to results/{lang}-{framework}/
not results/ root — use find to locate the file dynamically
- Add from __future__ import annotations for Python 3.9 compat
- Add test for missing endpoint metrics
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
upload-artifact@v4 forbids forward slashes in artifact names. Since inputs.app contains slashes (e.g. python/django, go/net-http), use tr to replace / with - in the artifact name. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Move workflow inputs to env vars to prevent GitHub Actions script injection via untrusted expression interpolation in run: blocks - Fix iterations field rendering as raw list instead of count (run_benchmark stores iterations as a list of dicts) - Add test for iterations-as-list rendering Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
- Add conditional setup-go step for Go apps so _prepare_sdk_version can run go get + go mod tidy on the host before Docker builds - Use -maxdepth 2 when finding results.json to avoid picking up raw vegeta NDJSON files at deeper levels Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…uard - Reusable workflow_call checks out caller's repo by default, not sdk-benchmarks. Add explicit repository: getsentry/sdk-benchmarks to the checkout step so bench CLI and configs are available. - Make summary overhead type guard consistent with per-endpoint: use isinstance(value, (int, float)) instead of is not None. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| results = run_benchmark(app, sdk_version, iterations=iterations, output_dir=output_dir) | ||
| click.echo(f"Benchmark complete. {len(results.get('iterations', []))} iterations recorded.") | ||
|
|
||
|
|
||
| @cli.command() |
There was a problem hiding this comment.
Bug: The run_benchmark function only outputs raw data, but format_comment expects computed summary statistics. This results in empty, useless benchmark reports being posted in PR comments.
Severity: CRITICAL
Suggested Fix
Implement the statistical analysis layer that was left as a stub. This involves adding logic to lib/metrics.py, lib/report.py, and lib/compare.py to process the raw iteration data from run_benchmark() and generate the summary, overhead, and regression fields that format_comment() requires to produce a meaningful report.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: bench.py#L26-L30
Potential issue: The `run_benchmark()` function writes raw iteration data to a JSON file
but does not compute the summary statistics (like `overhead`, `regression`, `endpoints`)
that the `format_comment()` function expects. Consequently, when `format_comment()`
reads this data, it finds no summary information. While the code avoids crashing by
using `.get()` with default values, the resulting PR comment is functionally useless,
containing empty tables and missing data. The core logic for statistical analysis,
intended for files like `lib/metrics.py` and `lib/report.py`, is completely missing,
breaking the data contract between the data generation and reporting steps.
Summary
Phase 4 of the implementation plan — GitHub Action integration.
.github/workflows/ci.yml— lint, test, and Docker build checks for all 11 containers (Python + Go + tools).github/workflows/benchmark.yml— reusable workflow for SDK repos to call from their CI.github/workflows/test-benchmark.yml— smoke test for the benchmark workflow.github/actions/benchmark/action.yml— composite action wrapperlib/github.py— sticky PR comment posting viaghCLI with per-endpoint overhead tablesbench.py— wiredruncommand tolib.runner, addedpost-commentCLI commandtests/test_github.py— 5 tests for comment formattingVerified locally with
actlintjob passestestjob passes (9/9 tests)docker-buildjob passes for all 11 matrix entries:Usage by SDK repos
Test plan
actlocallypytest tests/— 9/9 passruff check .clean🤖 Generated with Claude Code