Closed
Description
In CI failures like https://github.com/graingert/distributed/runs/6074901114?check_suite_focus=true we see that the test test_worker_waits_for_scheduler
times out.
I ran this test locally a thousand times and was able to reproduce
from distributed.tests.test_worker import test_worker_waits_for_scheduler
for _ in range(1000): test_worker_waits_for_scheduler()
I also added in some print statements to see if we could narrow down where things were hanging.
diff --git a/distributed/tests/test_worker.py b/distributed/tests/test_worker.py
index 209eed65..14bdb194 100644
--- a/distributed/tests/test_worker.py
+++ b/distributed/tests/test_worker.py
@@ -324,15 +324,21 @@ async def test_worker_port_range(s):
@pytest.mark.slow
@gen_test(timeout=60)
async def test_worker_waits_for_scheduler():
+ print(1)
w = Worker("127.0.0.1:8724")
+ print(2)
try:
- await asyncio.wait_for(w, 3)
+ await asyncio.wait_for(w, 0.1)
+ print(3)
except TimeoutError:
pass
+ print(4)
else:
assert False
assert w.status not in (Status.closed, Status.running, Status.paused)
+ print(5)
await w.close(timeout=0.1)
+ print(6)
I found the following
1
2
2022-04-19 06:57:52,758 - distributed.worker - INFO - Start worker at: tcp://127.0.0.1:34243
2022-04-19 06:57:52,758 - distributed.worker - INFO - Listening to: tcp://127.0.0.1:34243
2022-04-19 06:57:52,758 - distributed.worker - INFO - dashboard at: 127.0.0.1:41951
2022-04-19 06:57:52,759 - distributed.worker - INFO - Waiting to connect to: tcp://127.0.0.1:8724
2022-04-19 06:57:52,759 - distributed.worker - INFO - -------------------------------------------------
2022-04-19 06:57:52,759 - distributed.worker - INFO - Threads: 12
2022-04-19 06:57:52,760 - distributed.worker - INFO - Memory: 15.30 GiB
2022-04-19 06:57:52,760 - distributed.worker - INFO - Local Directory: /home/mrocklin/workspace/distributed/dask-worker-space/worker-_11h2mch
2022-04-19 06:57:52,760 - distributed.worker - INFO - -------------------------------------------------
2022-04-19 06:57:57,762 - distributed.worker - INFO - Waiting to connect to: tcp://127.0.0.1:8724
2022-04-19 06:58:02,864 - distributed.worker - INFO - Waiting to connect to: tcp://127.0.0.1:8724
2022-04-19 06:58:07,966 - distributed.worker - INFO - Waiting to connect to: tcp://127.0.0.1:8724
2022-04-19 06:58:13,069 - distributed.worker - INFO - Waiting to connect to: tcp://127.0.0.1:8724
2022-04-19 06:58:18,171 - distributed.worker - INFO - Waiting to connect to: tcp://127.0.0.1:8724
2022-04-19 06:58:23,273 - distributed.worker - INFO - Waiting to connect to: tcp://127.0.0.1:8724
2022-04-19 06:58:28,376 - distributed.worker - INFO - Waiting to connect to: tcp://127.0.0.1:8724
2022-04-19 06:58:33,479 - distributed.worker - INFO - Waiting to connect to: tcp://127.0.0.1:8724
2022-04-19 06:58:38,582 - distributed.worker - INFO - Waiting to connect to: tcp://127.0.0.1:8724
2022-04-19 06:58:43,685 - distributed.worker - INFO - Waiting to connect to: tcp://127.0.0.1:8724
2022-04-19 06:58:48,789 - distributed.worker - INFO - Waiting to connect to: tcp://127.0.0.1:8724
I believe that this points to the following line as the culprit
await asyncio.wait_for(w, 0.1)
This surprises me. At first glance it seems like the asyncio.wait_for
method isn't doing its job.
cc @graingert in case you have any ideas
My next step here is to stop using asyncio.wait_for
and see if I can get this test to pass with a manual version.
Metadata
Metadata
Assignees
Labels
No labels