Skip to content

Intermittent failure with test_worker_waits_for_scheduler #6153

Closed
@mrocklin

Description

@mrocklin

In CI failures like https://github.com/graingert/distributed/runs/6074901114?check_suite_focus=true we see that the test test_worker_waits_for_scheduler times out.

I ran this test locally a thousand times and was able to reproduce

from distributed.tests.test_worker import test_worker_waits_for_scheduler
for _ in range(1000): test_worker_waits_for_scheduler()

I also added in some print statements to see if we could narrow down where things were hanging.

diff --git a/distributed/tests/test_worker.py b/distributed/tests/test_worker.py
index 209eed65..14bdb194 100644
--- a/distributed/tests/test_worker.py
+++ b/distributed/tests/test_worker.py
@@ -324,15 +324,21 @@ async def test_worker_port_range(s):
 @pytest.mark.slow
 @gen_test(timeout=60)
 async def test_worker_waits_for_scheduler():
+    print(1)
     w = Worker("127.0.0.1:8724")
+    print(2)
     try:
-        await asyncio.wait_for(w, 3)
+        await asyncio.wait_for(w, 0.1)
+        print(3)
     except TimeoutError:
         pass
+        print(4)
     else:
         assert False
     assert w.status not in (Status.closed, Status.running, Status.paused)
+    print(5)
     await w.close(timeout=0.1)
+    print(6)

I found the following

1
2
2022-04-19 06:57:52,758 - distributed.worker - INFO -       Start worker at:      tcp://127.0.0.1:34243
2022-04-19 06:57:52,758 - distributed.worker - INFO -          Listening to:      tcp://127.0.0.1:34243
2022-04-19 06:57:52,758 - distributed.worker - INFO -          dashboard at:            127.0.0.1:41951
2022-04-19 06:57:52,759 - distributed.worker - INFO - Waiting to connect to:       tcp://127.0.0.1:8724
2022-04-19 06:57:52,759 - distributed.worker - INFO - -------------------------------------------------
2022-04-19 06:57:52,759 - distributed.worker - INFO -               Threads:                         12
2022-04-19 06:57:52,760 - distributed.worker - INFO -                Memory:                  15.30 GiB
2022-04-19 06:57:52,760 - distributed.worker - INFO -       Local Directory: /home/mrocklin/workspace/distributed/dask-worker-space/worker-_11h2mch
2022-04-19 06:57:52,760 - distributed.worker - INFO - -------------------------------------------------
2022-04-19 06:57:57,762 - distributed.worker - INFO - Waiting to connect to:       tcp://127.0.0.1:8724
2022-04-19 06:58:02,864 - distributed.worker - INFO - Waiting to connect to:       tcp://127.0.0.1:8724
2022-04-19 06:58:07,966 - distributed.worker - INFO - Waiting to connect to:       tcp://127.0.0.1:8724
2022-04-19 06:58:13,069 - distributed.worker - INFO - Waiting to connect to:       tcp://127.0.0.1:8724
2022-04-19 06:58:18,171 - distributed.worker - INFO - Waiting to connect to:       tcp://127.0.0.1:8724
2022-04-19 06:58:23,273 - distributed.worker - INFO - Waiting to connect to:       tcp://127.0.0.1:8724
2022-04-19 06:58:28,376 - distributed.worker - INFO - Waiting to connect to:       tcp://127.0.0.1:8724
2022-04-19 06:58:33,479 - distributed.worker - INFO - Waiting to connect to:       tcp://127.0.0.1:8724
2022-04-19 06:58:38,582 - distributed.worker - INFO - Waiting to connect to:       tcp://127.0.0.1:8724
2022-04-19 06:58:43,685 - distributed.worker - INFO - Waiting to connect to:       tcp://127.0.0.1:8724
2022-04-19 06:58:48,789 - distributed.worker - INFO - Waiting to connect to:       tcp://127.0.0.1:8724

I believe that this points to the following line as the culprit

await asyncio.wait_for(w, 0.1)

This surprises me. At first glance it seems like the asyncio.wait_for method isn't doing its job.

cc @graingert in case you have any ideas

My next step here is to stop using asyncio.wait_for and see if I can get this test to pass with a manual version.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions