Skip to content

feat: add GitHub Actions CI and reusable benchmark workflow#13

Merged
lcian merged 9 commits intomainfrom
feat/github-actions
Feb 17, 2026
Merged

feat: add GitHub Actions CI and reusable benchmark workflow#13
lcian merged 9 commits intomainfrom
feat/github-actions

Conversation

@giortzisg
Copy link
Collaborator

Summary

Phase 4 of the implementation plan — GitHub Action integration.

  • .github/workflows/ci.yml — lint, test, and Docker build checks for all 11 containers (Python + Go + tools)
  • .github/workflows/benchmark.yml — reusable workflow for SDK repos to call from their CI
  • .github/workflows/test-benchmark.yml — smoke test for the benchmark workflow
  • .github/actions/benchmark/action.yml — composite action wrapper
  • lib/github.py — sticky PR comment posting via gh CLI with per-endpoint overhead tables
  • bench.py — wired run command to lib.runner, added post-comment CLI command
  • tests/test_github.py — 5 tests for comment formatting

Verified locally with act

  • lint job passes
  • test job passes (9/9 tests)
  • docker-build job passes for all 11 matrix entries:
    • django-baseline, django-instrumented
    • fakerelay, loadgen, postgres
    • go-net-http-baseline, go-net-http-instrumented
    • go-gin-baseline, go-gin-instrumented
    • go-echo-baseline, go-echo-instrumented

Usage by SDK repos

jobs:
  bench:
    uses: getsentry/sdk-benchmarks/.github/workflows/benchmark.yml@main
    with:
      app: python/django
      sdk-version: "git+https://github.com/${{ github.repository }}@${{ github.head_ref }}"

Test plan

  • All CI jobs verified with act locally
  • pytest tests/ — 9/9 pass
  • ruff check . clean

🤖 Generated with Claude Code

giortzisg and others added 3 commits February 17, 2026 15:21
…ents

Phase 4 implementation:
- .github/workflows/ci.yml: lint, test, and Docker build checks on PRs
- .github/workflows/benchmark.yml: reusable workflow for SDK repos to
  call from their CI to run benchmarks and post results
- .github/actions/benchmark/action.yml: composite action wrapper
- lib/github.py: sticky PR comment posting via gh CLI, with per-endpoint
  overhead tables and regression detection
- bench.py: add post-comment CLI command
- tests/test_github.py: tests for comment formatting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add all 6 Go Docker images (3 apps × baseline + instrumented) to the
docker-build CI matrix. Verified all 11 containers build successfully
with act locally.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Connect bench run CLI command to lib.runner.run_benchmark()
- Add .github/workflows/test-benchmark.yml for smoke testing
- Update .gitignore

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add fail-fast: false so one matrix failure doesn't cancel all builds
- Render requirements-sentry.txt from .tmpl before docker build
  (strips version pin so instrumented Dockerfiles can build in CI)

Verified all 13 jobs pass with act:
- lint, test (9/9)
- 11 docker-build matrix entries
- smoke-test: full benchmark ran end-to-end

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
giortzisg and others added 2 commits February 17, 2026 15:46
- Fix format_comment crash when per-endpoint metrics are missing (N/A
  string passed to :+.2f format specifier)
- Fix benchmark.yml checking for results/report.md instead of results.json
- Fix results file path: runner writes to results/{lang}-{framework}/
  not results/ root — use find to locate the file dynamically
- Add from __future__ import annotations for Python 3.9 compat
- Add test for missing endpoint metrics

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
upload-artifact@v4 forbids forward slashes in artifact names.
Since inputs.app contains slashes (e.g. python/django, go/net-http),
use tr to replace / with - in the artifact name.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Move workflow inputs to env vars to prevent GitHub Actions script
  injection via untrusted expression interpolation in run: blocks
- Fix iterations field rendering as raw list instead of count
  (run_benchmark stores iterations as a list of dicts)
- Add test for iterations-as-list rendering

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

lcian and others added 2 commits February 17, 2026 16:19
- Add conditional setup-go step for Go apps so _prepare_sdk_version
  can run go get + go mod tidy on the host before Docker builds
- Use -maxdepth 2 when finding results.json to avoid picking up
  raw vegeta NDJSON files at deeper levels

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…uard

- Reusable workflow_call checks out caller's repo by default, not
  sdk-benchmarks. Add explicit repository: getsentry/sdk-benchmarks
  to the checkout step so bench CLI and configs are available.
- Make summary overhead type guard consistent with per-endpoint:
  use isinstance(value, (int, float)) instead of is not None.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment on lines +26 to 30
results = run_benchmark(app, sdk_version, iterations=iterations, output_dir=output_dir)
click.echo(f"Benchmark complete. {len(results.get('iterations', []))} iterations recorded.")


@cli.command()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The run_benchmark function only outputs raw data, but format_comment expects computed summary statistics. This results in empty, useless benchmark reports being posted in PR comments.
Severity: CRITICAL

Suggested Fix

Implement the statistical analysis layer that was left as a stub. This involves adding logic to lib/metrics.py, lib/report.py, and lib/compare.py to process the raw iteration data from run_benchmark() and generate the summary, overhead, and regression fields that format_comment() requires to produce a meaningful report.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: bench.py#L26-L30

Potential issue: The `run_benchmark()` function writes raw iteration data to a JSON file
but does not compute the summary statistics (like `overhead`, `regression`, `endpoints`)
that the `format_comment()` function expects. Consequently, when `format_comment()`
reads this data, it finds no summary information. While the code avoids crashing by
using `.get()` with default values, the resulting PR comment is functionally useless,
containing empty tables and missing data. The core logic for statistical analysis,
intended for files like `lib/metrics.py` and `lib/report.py`, is completely missing,
breaking the data contract between the data generation and reporting steps.

@lcian lcian merged commit 40ec1bb into main Feb 17, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants