Skip to content

Conversation

@hank95179
Copy link
Contributor

This PR fixes the flaky test test pubsub shardchannels_true tracked in #4925.

The test was consistently failing with a TimeoutError in CI environments under high load. This indicates that the pubsubShardChannels command sometimes takes longer than the default client timeout to respond, likely due to transient server load or network congestion, causing the test to fail prematurely.

Changes

  • Implemented a polling retry mechanism with error handling.
  • Wrapped the pubsubShardChannels call in a try-catch block within a retry loop.
  • If the command times out or fails transiently, the test now catches the error and retries (up to a limit), making it robust against transient performance dips and network timeouts.

Verification

  • Verified locally that the test passes consistently with the added retry logic, even when simulating high load conditions that previously triggered timeouts.

Related Issue

Fixes #4925

@hank95179 hank95179 requested a review from a team as a code owner November 21, 2025 07:23
@hank95179 hank95179 force-pushed the fix/node-flaky-pubsub-shardchannels-4925 branch from a1c8b5a to c983279 Compare November 21, 2025 07:36
Signed-off-by: hank95179 <hank95179@gmail.com>
@hank95179 hank95179 force-pushed the fix/node-flaky-pubsub-shardchannels-4925 branch from c983279 to 0cb8899 Compare November 21, 2025 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Node][Flaky Test] PubSub › test pubsub shardchannels_true

1 participant