feat: add GitHub Actions CI and reusable benchmark workflow#13

Merged

lcian merged 9 commits intomainfrom

feat/github-actions

Feb 17, 2026

Collaborator

giortzisg commented Feb 17, 2026

Summary

Phase 4 of the implementation plan — GitHub Action integration.

.github/workflows/ci.yml — lint, test, and Docker build checks for all 11 containers (Python + Go + tools)
.github/workflows/benchmark.yml — reusable workflow for SDK repos to call from their CI
.github/workflows/test-benchmark.yml — smoke test for the benchmark workflow
.github/actions/benchmark/action.yml — composite action wrapper
lib/github.py — sticky PR comment posting via gh CLI with per-endpoint overhead tables
bench.py — wired run command to lib.runner, added post-comment CLI command
tests/test_github.py — 5 tests for comment formatting

Verified locally with `act`

lint job passes
test job passes (9/9 tests)
docker-build job passes for all 11 matrix entries:
- django-baseline, django-instrumented
- fakerelay, loadgen, postgres
- go-net-http-baseline, go-net-http-instrumented
- go-gin-baseline, go-gin-instrumented
- go-echo-baseline, go-echo-instrumented

Usage by SDK repos

jobs:
  bench:
    uses: getsentry/sdk-benchmarks/.github/workflows/benchmark.yml@main
    with:
      app: python/django
      sdk-version: "git+https://github.com/${{ github.repository }}@${{ github.head_ref }}"

Test plan

All CI jobs verified with act locally
pytest tests/ — 9/9 pass
ruff check . clean

🤖 Generated with Claude Code

giortzisg and others added 3 commits

February 17, 2026 15:21


          feat: add GitHub Actions CI, reusable benchmark workflow, and PR comm…

555a25d

…ents

Phase 4 implementation:
- .github/workflows/ci.yml: lint, test, and Docker build checks on PRs
- .github/workflows/benchmark.yml: reusable workflow for SDK repos to
  call from their CI to run benchmarks and post results
- .github/actions/benchmark/action.yml: composite action wrapper
- lib/github.py: sticky PR comment posting via gh CLI, with per-endpoint
  overhead tables and regression detection
- bench.py: add post-comment CLI command
- tests/test_github.py: tests for comment formatting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>


          feat: add Go app Docker builds to CI matrix

02654a2

Add all 6 Go Docker images (3 apps × baseline + instrumented) to the
docker-build CI matrix. Verified all 11 containers build successfully
with act locally.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>


          feat: wire bench run to runner, add benchmark smoke test workflow

c73ca9c

- Connect bench run CLI command to lib.runner.run_benchmark()
- Add .github/workflows/test-benchmark.yml for smoke testing
- Update .gitignore

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

sentry bot reviewed

View reviewed changes

lib/github.py Outdated Show resolved Hide resolved

lib/github.py Show resolved Hide resolved

.github/workflows/benchmark.yml Outdated Show resolved Hide resolved

cursor bot reviewed

View reviewed changes

lib/github.py Outdated Show resolved Hide resolved

.github/workflows/benchmark.yml Outdated Show resolved Hide resolved

.github/workflows/benchmark.yml Outdated Show resolved Hide resolved

lib/github.py Outdated Show resolved Hide resolved

.github/workflows/benchmark.yml Outdated Show resolved Hide resolved


          fix: add fail-fast:false and render sentry requirements in CI

f716437

- Add fail-fast: false so one matrix failure doesn't cancel all builds
- Render requirements-sentry.txt from .tmpl before docker build
  (strips version pin so instrumented Dockerfiles can build in CI)

Verified all 13 jobs pass with act:
- lint, test (9/9)
- 11 docker-build matrix entries
- smoke-test: full benchmark ran end-to-end

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor bot reviewed

View reviewed changes

.github/workflows/benchmark.yml Outdated Show resolved Hide resolved

giortzisg and others added 2 commits

February 17, 2026 15:46


          fix: address review bugs — format crash, wrong file check, results path

671d4f3

- Fix format_comment crash when per-endpoint metrics are missing (N/A
  string passed to :+.2f format specifier)
- Fix benchmark.yml checking for results/report.md instead of results.json
- Fix results file path: runner writes to results/{lang}-{framework}/
  not results/ root — use find to locate the file dynamically
- Add from __future__ import annotations for Python 3.9 compat
- Add test for missing endpoint metrics

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>


          fix: sanitize artifact name to replace slashes with hyphens

721e8bb

upload-artifact@v4 forbids forward slashes in artifact names.
Since inputs.app contains slashes (e.g. python/django, go/net-http),
use tr to replace / with - in the artifact name.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

sentry bot reviewed

View reviewed changes

bench.py Show resolved Hide resolved

cursor bot reviewed

View reviewed changes

lib/github.py Outdated Show resolved Hide resolved

lib/github.py Show resolved Hide resolved

.github/workflows/benchmark.yml Show resolved Hide resolved


          fix: prevent script injection and fix iterations display

104047b

- Move workflow inputs to env vars to prevent GitHub Actions script
  injection via untrusted expression interpolation in run: blocks
- Fix iterations field rendering as raw list instead of count
  (run_benchmark stores iterations as a list of dicts)
- Add test for iterations-as-list rendering

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor bot reviewed

View reviewed changes

cursor bot left a comment

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

lib/github.py Show resolved Hide resolved

.github/workflows/benchmark.yml Show resolved Hide resolved

lcian and others added 2 commits

February 17, 2026 16:19


          fix: add Go setup and fix results.json lookup in benchmark.yml

e7526de

- Add conditional setup-go step for Go apps so _prepare_sdk_version
  can run go get + go mod tidy on the host before Docker builds
- Use -maxdepth 2 when finding results.json to avoid picking up
  raw vegeta NDJSON files at deeper levels

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>


          fix: checkout sdk-benchmarks repo in reusable workflow, harden type g…

22d9654

…uard

- Reusable workflow_call checks out caller's repo by default, not
  sdk-benchmarks. Add explicit repository: getsentry/sdk-benchmarks
  to the checkout step so bench CLI and configs are available.
- Make summary overhead type guard consistent with per-endpoint:
  use isinstance(value, (int, float)) instead of is not None.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

sentry bot reviewed

View reviewed changes

bench.py

Comment on lines +26 to 30

+                  results = run_benchmark(app, sdk_version, iterations=iterations, output_dir=output_dir)
+                  click.echo(f"Benchmark complete. {len(results.get('iterations', []))} iterations recorded.")
               @cli.command()

sentry bot Feb 17, 2026

Bug: The run_benchmark function only outputs raw data, but format_comment expects computed summary statistics. This results in empty, useless benchmark reports being posted in PR comments.
_{Severity: CRITICAL}

Suggested Fix

Implement the statistical analysis layer that was left as a stub. This involves adding logic to lib/metrics.py, lib/report.py, and lib/compare.py to process the raw iteration data from run_benchmark() and generate the summary, overhead, and regression fields that format_comment() requires to produce a meaningful report.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: bench.py#L26-L30

Potential issue: The `run_benchmark()` function writes raw iteration data to a JSON file
but does not compute the summary statistics (like `overhead`, `regression`, `endpoints`)
that the `format_comment()` function expects. Consequently, when `format_comment()`
reads this data, it finds no summary information. While the code avoids crashing by
using `.get()` with default values, the resulting PR comment is functionally useless,
containing empty tables and missing data. The core logic for statistical analysis,
intended for files like `lib/metrics.py` and `lib/report.py`, is completely missing,
breaking the data contract between the data generation and reporting steps.

lcian approved these changes

View reviewed changes

lcian merged commit 40ec1bb into main

19 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet