Skip to content

Fix flaky test: test_multiple_clients_subscription #1637

Open
@sanity

Description

@sanity

Description

The integration test test_multiple_clients_subscription in crates/core/tests/operations.rs is failing intermittently in CI but passes locally. The test has been temporarily disabled to unblock PR merges.

Failure Details

  • Test location: crates/core/tests/operations.rs::test_multiple_clients_subscription
  • CI failure: Occurs when running with --no-default-features --features trace,websocket,redb
  • Error: Timeout waiting for put response after ~83 seconds
  • Expected timeout: Test has a 120-second timeout but fails earlier in CI

Symptoms

The test fails with:

error: anyhow::Error - Timeout waiting for put response

Looking at the logs, there are connection issues between nodes:

  • "Connection failed: TransportError(ConnectionEstablishmentFailure { cause: "connection attempt already in progress" })"
  • "Unable to forward or accept any connections"
  • "failed notifying, channel closed"

Test Description

The test:

  1. Sets up a 3-node network (Node A, Node B as gateway, Node C)
  2. Node A puts a contract and subscribes to it
  3. Node C (Client 2) subscribes to the same contract
  4. Client 1 updates the contract
  5. Both clients should receive update notifications

The failure occurs when Client 2 tries to put an update to the contract.

Potential Causes

  1. Race condition: The test may have timing assumptions that don't hold in CI
  2. Resource constraints: CI environment may be slower/more constrained
  3. Connection establishment: The "connection attempt already in progress" errors suggest connection state management issues
  4. Network topology: The test creates a specific network topology that may not stabilize properly in CI

Reproduction

The test passes locally but fails in GitHub Actions CI. To reproduce locally, try:

cargo test -p freenet --test operations test_multiple_clients_subscription --no-default-features --features trace,websocket,redb

Suggested Fixes

  1. Investigate connection establishment logic to fix "connection attempt already in progress" errors
  2. Add more robust retry logic for put operations in the test
  3. Increase timeouts or add explicit synchronization points
  4. Review recent changes to connection handling (commit feccbde mentions fixing integration test failures)
  5. Consider breaking the test into smaller, more focused tests

Related

  • Recent commit feccbde: "fix: integration test failures and improve connection establishment latency"
  • This suggests the issue has been ongoing and may require deeper investigation into the connection establishment code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions