Automated benchmarking infrastructure for Apache DataFusion and Apache Arrow-rs. Triggered via PR comments, executed on GKE Autopilot, results posted back to the PR.
GitHub PR comment GitHub PR
"run benchmark tpch" (results posted)
│ ▲
▼ │
┌──────────────────────────────────────────────────────┐
│ Controller Pod (StatefulSet) │
│ │
│ ┌──────────────┐ ┌───────────────────────┐ │
│ │ GitHub Poller │ │ Job Reconciler │ │
│ │ (poll_loop) │ │ (reconcile_loop) │ │
│ │ │ │ │ │
│ │ fetch comments│ │ pending → create Job │ │
│ │ detect trigger│ │ running → check K8s │ │
│ │ insert job │ │ done → post result│ │
│ └──────┬────────┘ └───────────┬───────────┘ │
│ │ SQLite │ │
│ └──────────┤ benchmark.db ├──┘ │
└──────────────────────────────────────────────────────┘
│
▼
┌──────────────┐
│ K8s Job │
│ (spot node) │
│ │
│ clone repo │
│ build & run │
│ post results │
└──────────────┘
- The GitHub poller scans PR comments every 30s for trigger phrases
- Valid triggers insert a row into SQLite with status
pending - The job reconciler picks up pending rows and creates K8s Jobs
- Each Job runs on a Performance-class spot node, builds the project, runs benchmarks, and posts results back to the PR
- A scheduled CI workflow benchmarks the latest
mainevery 6 hours
Comment on a PR in apache/datafusion or apache/arrow-rs:
run benchmarks
Or run specific benchmarks:
run benchmark tpch_mem clickbench_partitioned
By default, benchmarks compare the PR's merge base (baseline) against the PR head (changed). You can override either side with any git ref (branch, tag, or commit SHA):
run benchmark tpch
baseline:
ref: v45.0.0
changed:
ref: v46.0.0You can also set just one side and leave the other as the default:
run benchmark tpch
baseline:
ref: mainEnvironment variables can be set at three levels:
Shared (applies to both baseline and changed):
run benchmark tpch
env:
RUST_LOG: debugPer-side (different values for baseline vs changed):
run benchmark tpch
baseline:
env:
DATAFUSION_RUNTIME_MEMORY_LIMIT: 1G
changed:
env:
DATAFUSION_RUNTIME_MEMORY_LIMIT: 2GCombined (shared + per-side + custom refs):
run benchmark tpch clickbench_1
env:
RUST_LOG: debug
baseline:
ref: v45.0.0
env:
DATAFUSION_RUNTIME_MEMORY_LIMIT: 1G
changed:
ref: v46.0.0
env:
DATAFUSION_RUNTIME_MEMORY_LIMIT: 2GPer-side env vars override shared env vars when both set the same key.
View the queue:
show benchmark queue
| Benchmark | Description |
|---|---|
tpch |
TPC-H SF=1 |
tpch10 |
TPC-H SF=10 |
tpch_mem |
TPC-H SF=1 in-memory |
tpch_mem10 |
TPC-H SF=10 in-memory |
clickbench_partitioned |
ClickBench partitioned Parquet |
clickbench_extended |
ClickBench extended queries |
clickbench_1 |
ClickBench single-file Parquet |
clickbench_pushdown |
ClickBench pushdown queries |
external_aggr |
External aggregation |
tpcds |
TPC-DS |
smj |
Sort-merge join |
sort_pushdown |
Sort pushdown |
sort_pushdown_sorted |
Sort pushdown sorted |
sql_planner, in_list, case_when, aggregate_vectorized, aggregate_query_sql, with_hashes, range_and_generate_series, sort, left, strpos, substr_index, character_length, reset_plan_states, replace, plan_reuse
arrow_reader, arrow_reader_clickbench, arrow_reader_row_filter, arrow_statistics, arrow_writer, array_iter, array_from, bitwise_kernel, boolean_kernels, buffer_bit_ops, builder, cast_kernels, comparison_kernels, csv_writer, coalesce_kernels, encoding, metadata, json-reader, ipc_reader, take_kernels, sort_kernel, interleave_kernels, union_array, variant_builder, variant_kernels, view_types, variant_validation, filter_kernels, concatenate_kernel, row_format, zip_kernels
Monorepo with npm workspaces (TypeScript) and a Cargo workspace (Rust).
package.json Root npm workspace (infra + services)
Cargo.toml Root Cargo workspace (controller)
.github/workflows/ GitHub Actions (deploy, build, CI, scheduled benchmarks)
infra/ Pulumi stack: GCP resources (GKE, Artifact Registry, IAM)
services/ Pulumi stack: K8s deployments (controller StatefulSet)
controller/ Rust controller crate
src/
main.rs Entry point — spawns poller + reconciler
config.rs Environment-based configuration
models.rs SQLite row types, GitHub API types
github.rs GitHub REST API client
github_poller.rs Comment polling loop
job_manager.rs K8s Job lifecycle reconciler
db.rs SQLite queries (jobs, seen comments, scan state)
benchmarks.rs Trigger parsing, benchmark allowlists
migrations/ SQLite schema
runner/ Benchmark runner container (builds project, runs benchmarks, posts results)
queries/ SQL query files for ClickBench
scripts/ Legacy benchmark scripts (reference)
cargo build --manifest-path controller/Cargo.toml
cargo fmt --check --manifest-path controller/Cargo.toml
cargo clippy --manifest-path controller/Cargo.toml -- -D warningscargo test --manifest-path controller/Cargo.tomlThe controller needs a GitHub token and a runner image. SQLite is used for state (auto-created).
export GITHUB_TOKEN="ghp_..."
export RUNNER_IMAGE="us-docker.pkg.dev/.../runner:latest"
# Optional overrides (shown with defaults):
export DATABASE_URL="sqlite:///data/benchmark.db"
export WATCHED_REPOS="apache/datafusion:apache/arrow-rs"
export POLL_INTERVAL_SECS=2
export RECONCILE_INTERVAL_SECS=3
export K8S_NAMESPACE="benchmarking"
export DEFAULT_CPU=30
export DEFAULT_MEMORY="60Gi"
cargo run --manifest-path controller/Cargo.tomlNote: the reconciler requires in-cluster K8s access. Outside a cluster it will fail to create jobs. The poller works standalone for testing comment detection.
Managed with Pulumi (TypeScript), split into two stacks connected via StackReference:
infra— GCP resources: GKE Autopilot cluster, Artifact Registry, IAM (controller SA + WI binding)services— K8s resources: namespace, service account, secrets, controller StatefulSet
npm install # install all workspace deps from root
cd infra && pulumi preview # dry-run infra
cd services && pulumi preview # dry-run servicesThe dependency chain is: infra (pulumi up) → build images (docker push) → services (pulumi up).
Key resources:
- GKE Autopilot cluster with Performance-class spot instances
- Hyperdisk-balanced ephemeral volumes for fast compilation I/O
- Artifact Registry for runner container images
- Workload Identity Federation for keyless auth (GitHub Actions + controller pod)
One-time setup for GitHub Actions → GCP authentication:
- Run
infra/bootstrap.sh(creates WIF pool, OIDC provider, gha-deployer SA via gcloud) - Set the 3 GitHub repo variables printed by the script (
GCP_PROJECT_ID,GCP_WORKLOAD_IDENTITY_PROVIDER,GCP_SERVICE_ACCOUNT_EMAIL) gcloud auth application-default login && cd infra && pulumi up --stack dev- Push to main — GHA takes over from there
┌─────────┐
insert_job() ───► │ pending │
└────┬────┘
│ reconcile_pending()
│ create K8s Job
▼
┌─────────┐
│ running │
└────┬────┘
│ reconcile_active()
│ check K8s Job status
┌─────┴──────┐
▼ ▼
┌───────────┐ ┌────────┐
│ completed │ │ failed │
└───────────┘ └────────┘
| Component | Role |
|---|---|
github_poller |
Poll GitHub API, detect triggers, insert jobs |
job_manager |
Create K8s Jobs, monitor status, post results |
db |
SQLite persistence (jobs, seen comments, scan state) |
benchmarks |
Trigger parsing, per-repo allowlists, classification |
github |
GitHub REST API client (comments, reactions) |
config |
Environment-based configuration |
We use Logfire for tracing and metrics.