Skip to content

Flaky test_shuffle.py::test_flaky_connect_recover_with_retry #8555

@crusaderky

Description

@crusaderky

This test is very flaky:
3 out of 11 runs failed: test_flaky_connect_recover_with_retry (distributed.shuffle.tests.test_shuffle)

e.g. https://github.com/dask/distributed/actions/runs/8156647196/job/22294610970?pr=8551

>       assert len(logs) < 600
E       AssertionError: assert 656 < 600
E        +  where 656 = len('Retrying <function ShuffleRun.send.<locals>._send at 0x7fa57efcb4c0> after exception in attempt 0/1: Timed out trying to connect to tcp://127.0.0.1:37919 after 0 s\nRetrying <function ShuffleRun.send.<locals>._send at 0x7fa56eb03160> after exception in attempt 0/1: Timed out trying to connect to tcp://127.0.0.1:37919 after 0 s\nRetrying <function ShuffleRun.send.<locals>._send at 0x7fa57d0ba700> after exception in attempt 0/1: Timed out trying to connect to tcp://127.0.0.1:38667 after 0 s\nRetrying <function ShuffleRun.send.<locals>._send at 0x7fa57d576670> after exception in attempt 0/1: Timed out trying to connect to tcp://127.0.0.1:38667 after 0 s\n')

This is a regression introduced in #8511.
CC @hendrikmakait

Metadata

Metadata

Assignees

Labels

bugSomething is brokenflaky testIntermittent failures on CI.regressionshuffletestsUnit tests and/or continuous integration

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions