DataFusion Benchmarking

Automated benchmarking infrastructure for Apache DataFusion and Apache Arrow-rs. Triggered via PR comments, executed on GKE Autopilot, results posted back to the PR.

Architecture

GitHub PR comment                        GitHub PR
  "run benchmark tpch"                   (results posted)
        │                                       ▲
        ▼                                       │
┌──────────────────────────────────────────────────────┐
│  Controller Pod (StatefulSet)                        │
│                                                      │
│  ┌──────────────┐       ┌───────────────────────┐   │
│  │ GitHub Poller │       │ Job Reconciler        │   │
│  │ (poll_loop)   │       │ (reconcile_loop)      │   │
│  │               │       │                       │   │
│  │ fetch comments│       │ pending  → create Job │   │
│  │ detect trigger│       │ running  → check K8s  │   │
│  │ insert job    │       │ done     → post result│   │
│  └──────┬────────┘       └───────────┬───────────┘   │
│         │    SQLite                  │               │
│         └──────────┤ benchmark.db ├──┘               │
└──────────────────────────────────────────────────────┘
                           │
                           ▼
                    ┌──────────────┐
                    │  K8s Job     │
                    │  (spot node) │
                    │              │
                    │ clone repo   │
                    │ build & run  │
                    │ post results │
                    └──────────────┘

How It Works

The GitHub poller scans PR comments every 30s for trigger phrases
Valid triggers insert a row into SQLite with status pending
The job reconciler picks up pending rows and creates K8s Jobs
Each Job runs on a Performance-class spot node, builds the project, runs benchmarks, and posts results back to the PR
A scheduled CI workflow benchmarks the latest main every 6 hours

Triggering Benchmarks

Comment on a PR in apache/datafusion or apache/arrow-rs:

run benchmarks

Or run specific benchmarks:

run benchmark tpch_mem clickbench_partitioned

Comparing specific branches or commits

By default, benchmarks compare the PR's merge base (baseline) against the PR head (changed). You can override either side with any git ref (branch, tag, or commit SHA):

run benchmark tpch
baseline:
  ref: v45.0.0
changed:
  ref: v46.0.0

You can also set just one side and leave the other as the default:

run benchmark tpch
baseline:
  ref: main

Per-side environment variables

Environment variables can be set at three levels:

Shared (applies to both baseline and changed):

run benchmark tpch
env:
  RUST_LOG: debug

Per-side (different values for baseline vs changed):

run benchmark tpch
baseline:
  env:
    DATAFUSION_RUNTIME_MEMORY_LIMIT: 1G
changed:
  env:
    DATAFUSION_RUNTIME_MEMORY_LIMIT: 2G

Combined (shared + per-side + custom refs):

run benchmark tpch clickbench_1
env:
  RUST_LOG: debug
baseline:
  ref: v45.0.0
  env:
    DATAFUSION_RUNTIME_MEMORY_LIMIT: 1G
changed:
  ref: v46.0.0
  env:
    DATAFUSION_RUNTIME_MEMORY_LIMIT: 2G

Per-side env vars override shared env vars when both set the same key.

View the queue

View the queue:

show benchmark queue

Supported Benchmarks

DataFusion (standard)

Benchmark	Description
`tpch`	TPC-H SF=1
`tpch10`	TPC-H SF=10
`tpch_mem`	TPC-H SF=1 in-memory
`tpch_mem10`	TPC-H SF=10 in-memory
`clickbench_partitioned`	ClickBench partitioned Parquet
`clickbench_extended`	ClickBench extended queries
`clickbench_1`	ClickBench single-file Parquet
`clickbench_pushdown`	ClickBench pushdown queries
`external_aggr`	External aggregation
`tpcds`	TPC-DS
`smj`	Sort-merge join
`sort_pushdown`	Sort pushdown
`sort_pushdown_sorted`	Sort pushdown sorted

DataFusion (criterion)

sql_planner, in_list, case_when, aggregate_vectorized, aggregate_query_sql, with_hashes, range_and_generate_series, sort, left, strpos, substr_index, character_length, reset_plan_states, replace, plan_reuse

Arrow-rs (criterion)

arrow_reader, arrow_reader_clickbench, arrow_reader_row_filter, arrow_statistics, arrow_writer, array_iter, array_from, bitwise_kernel, boolean_kernels, buffer_bit_ops, builder, cast_kernels, comparison_kernels, csv_writer, coalesce_kernels, encoding, metadata, json-reader, ipc_reader, take_kernels, sort_kernel, interleave_kernels, union_array, variant_builder, variant_kernels, view_types, variant_validation, filter_kernels, concatenate_kernel, row_format, zip_kernels

Project Structure

Monorepo with npm workspaces (TypeScript) and a Cargo workspace (Rust).

package.json               Root npm workspace (infra + services)
Cargo.toml                 Root Cargo workspace (controller)
.github/workflows/         GitHub Actions (deploy, build, CI, scheduled benchmarks)
infra/                     Pulumi stack: GCP resources (GKE, Artifact Registry, IAM)
services/                  Pulumi stack: K8s deployments (controller StatefulSet)
controller/                Rust controller crate
  src/
    main.rs                Entry point — spawns poller + reconciler
    config.rs              Environment-based configuration
    models.rs              SQLite row types, GitHub API types
    github.rs              GitHub REST API client
    github_poller.rs       Comment polling loop
    job_manager.rs         K8s Job lifecycle reconciler
    db.rs                  SQLite queries (jobs, seen comments, scan state)
    benchmarks.rs          Trigger parsing, benchmark allowlists
  migrations/              SQLite schema
runner/                    Benchmark runner container (builds project, runs benchmarks, posts results)
queries/                   SQL query files for ClickBench
scripts/                   Legacy benchmark scripts (reference)

Building

cargo build --manifest-path controller/Cargo.toml
cargo fmt --check --manifest-path controller/Cargo.toml
cargo clippy --manifest-path controller/Cargo.toml -- -D warnings

Testing

cargo test --manifest-path controller/Cargo.toml

Running Locally

The controller needs a GitHub token and a runner image. SQLite is used for state (auto-created).

export GITHUB_TOKEN="ghp_..."
export RUNNER_IMAGE="us-docker.pkg.dev/.../runner:latest"

# Optional overrides (shown with defaults):
export DATABASE_URL="sqlite:///data/benchmark.db"
export WATCHED_REPOS="apache/datafusion:apache/arrow-rs"
export POLL_INTERVAL_SECS=2
export RECONCILE_INTERVAL_SECS=3
export K8S_NAMESPACE="benchmarking"
export DEFAULT_CPU=30
export DEFAULT_MEMORY="60Gi"

cargo run --manifest-path controller/Cargo.toml

Note: the reconciler requires in-cluster K8s access. Outside a cluster it will fail to create jobs. The poller works standalone for testing comment detection.

Infrastructure

Managed with Pulumi (TypeScript), split into two stacks connected via StackReference:

infra — GCP resources: GKE Autopilot cluster, Artifact Registry, IAM (controller SA + WI binding)
services — K8s resources: namespace, service account, secrets, controller StatefulSet

npm install          # install all workspace deps from root
cd infra && pulumi preview    # dry-run infra
cd services && pulumi preview # dry-run services

The dependency chain is: infra (pulumi up) → build images (docker push) → services (pulumi up).

Key resources:

GKE Autopilot cluster with Performance-class spot instances
Hyperdisk-balanced ephemeral volumes for fast compilation I/O
Artifact Registry for runner container images
Workload Identity Federation for keyless auth (GitHub Actions + controller pod)

Bootstrapping

One-time setup for GitHub Actions → GCP authentication:

Run infra/bootstrap.sh (creates WIF pool, OIDC provider, gha-deployer SA via gcloud)
Set the 3 GitHub repo variables printed by the script (GCP_PROJECT_ID, GCP_WORKLOAD_IDENTITY_PROVIDER, GCP_SERVICE_ACCOUNT_EMAIL)
gcloud auth application-default login && cd infra && pulumi up --stack dev
Push to main — GHA takes over from there

Design

Job Lifecycle

                     ┌─────────┐
   insert_job() ───► │ pending │
                     └────┬────┘
                          │ reconcile_pending()
                          │ create K8s Job
                          ▼
                     ┌─────────┐
                     │ running │
                     └────┬────┘
                          │ reconcile_active()
                          │ check K8s Job status
                    ┌─────┴──────┐
                    ▼            ▼
              ┌───────────┐ ┌────────┐
              │ completed │ │ failed │
              └───────────┘ └────────┘

Component Responsibilities

Component	Role
`github_poller`	Poll GitHub API, detect triggers, insert jobs
`job_manager`	Create K8s Jobs, monitor status, post results
`db`	SQLite persistence (jobs, seen comments, scan state)
`benchmarks`	Trigger parsing, per-repo allowlists, classification
`github`	GitHub REST API client (comments, reactions)
`config`	Environment-based configuration

Observability

We use Logfire for tracing and metrics.

Name		Name	Last commit message	Last commit date
Latest commit History 276 Commits
.github/workflows		.github/workflows
controller		controller
infra		infra
queries/clickbench		queries/clickbench
runner		runner
services		services
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.base.json		tsconfig.base.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataFusion Benchmarking

Architecture

How It Works

Triggering Benchmarks

Comparing specific branches or commits

Per-side environment variables

View the queue

Supported Benchmarks

DataFusion (standard)

DataFusion (criterion)

Arrow-rs (criterion)

Project Structure

Building

Testing

Running Locally

Infrastructure

Bootstrapping

Design

Job Lifecycle

Component Responsibilities

Observability

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DataFusion Benchmarking

Architecture

How It Works

Triggering Benchmarks

Comparing specific branches or commits

Per-side environment variables

View the queue

Supported Benchmarks

DataFusion (standard)

DataFusion (criterion)

Arrow-rs (criterion)

Project Structure

Building

Testing

Running Locally

Infrastructure

Bootstrapping

Design

Job Lifecycle

Component Responsibilities

Observability

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages