Skip to content

ci: enable cluster + CLI integration tests#242

Merged
kacy merged 1 commit intomainfrom
feat/cluster-server-wiring
Feb 22, 2026
Merged

ci: enable cluster + CLI integration tests#242
kacy merged 1 commit intomainfrom
feat/cluster-server-wiring

Conversation

@kacy
Copy link
Owner

@kacy kacy commented Feb 22, 2026

summary

removes all --skip filters from CI, enabling the full integration test suite:

  • 29 cluster integration tests (topology, slot routing, MOVED redirects, replication, failover)
  • 7 CLI integration tests (oneshot, connection, cluster info, benchmark smoke)
  • 43 existing tests (basic ops, data types, persistence, pubsub, auth, proto)

the cluster wiring was already complete — these tests were skipped during
development and never re-enabled.

what changed

tests/integration/src/helpers.rs — deterministic port allocation for
cluster-enabled test servers. a PID-seeded atomic counter hands out blocks of
3 consecutive ports (data, gossip at +1, raft at +2), preventing the race
between OS port assignment and server binding that caused flaky "connection
reset by peer" failures under parallel execution.

.github/workflows/ci.yml — split the test step into:

  • unit tests: runs all workspace crate tests at full parallelism
  • integration tests: runs the subprocess-based integration suite with
    --test-threads=2 (each test spawns 1-3 ember-server processes, so higher
    parallelism causes resource contention on CI runners)

what was tested

  • cargo build --workspace --features protobuf,grpc — clean
  • cargo test --workspace --features protobuf,grpc --exclude ember-integration-tests — 387+ unit tests pass
  • cargo test -p ember-integration-tests --test integration -- --test-threads=2 — all 79 integration tests pass, stable across repeated runs
  • cargo clippy --workspace --features protobuf,grpc -- -D warnings — clean
  • cargo fmt --all --check — clean

design considerations

considered higher thread counts (4, 8) but they produce intermittent failures
from resource contention when ~50 ember-server subprocesses run simultaneously.
--test-threads=2 is the sweet spot — reliable while keeping total integration
test time under 10 seconds.

remove all --skip filters from CI and run the full integration test
suite (79 tests including 29 cluster and 7 CLI tests).

two changes make this reliable:

- deterministic port allocation for cluster servers: a PID-seeded
  atomic counter hands out blocks of 3 consecutive ports (data,
  gossip, raft), eliminating the race between find_free_port() and
  the server binding its cluster ports.

- --test-threads=2 for integration tests: each cluster test spawns
  1-3 ember-server subprocesses with tokio runtimes, raft, and gossip.
  more than 2 concurrent tests causes resource contention and
  connection resets on shared CI runners.

unit tests are split into a separate step so they still run at full
parallelism.
@kacy kacy merged commit 6f7af4e into main Feb 22, 2026
5 of 7 checks passed
@kacy kacy deleted the feat/cluster-server-wiring branch February 22, 2026 22:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant