Flaky `test_scheduler_port_zero` #6758

gjoseph92 · 2022-07-20T19:41:45Z

________________ ERROR at teardown of test_scheduler_port_zero _________________
cleanup = None
@pytest.fixture
defloop(cleanup):
>       with check_instances():
distributed/utils_test.py:148: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../../miniconda3/envs/dask-distributed/lib/python3.10/contextlib.py:142: in __exit__
next(self.gen)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
@contextmanager
defcheck_instances():
        Client._instances.clear()
        Worker._instances.clear()
        Scheduler._instances.clear()
        SpecCluster._instances.clear()
        Worker._initialized_clients.clear()
        SchedulerTaskState._instances.clear()
        WorkerTaskState._instances.clear()
        Nanny._instances.clear()
        _global_clients.clear()
        Comm._instances.clear()
yield
        start = time()
whileset(_global_clients):
            sleep(0.1)
>           assert time() < start + 10
E           assert 1657573967.6341991 < (1657573957.571699 + 10)
E            +  where 1657573967.6341991 = time()
distributed/utils_test.py:1844: AssertionError
----------------------------- Captured stderr call -----------------------------
2022-07-11 21:12:29,544 - distributed.scheduler - INFO - -----------------------------------------------
2022-07-11 21:12:29,595 - distributed.http.proxy - INFO - To route to workers diagnostics web server please install jupyter-server-proxy: python -m pip install jupyter-server-proxy
2022-07-11 21:12:29,606 - distributed.scheduler - INFO - State start
2022-07-11 21:12:29,611 - distributed.scheduler - INFO - -----------------------------------------------
2022-07-11 21:12:29,612 - distributed.scheduler - INFO - Clear task state
2022-07-11 21:12:29,613 - distributed.scheduler - INFO -   Scheduler at: tcp://10.212.20.218:49565
2022-07-11 21:12:29,613 - distributed.scheduler - INFO -   dashboard at:                     :8787
2022-07-11 21:12:29,666 - distributed.scheduler - INFO - Receive client connection: Client-2aec3a60-015e-11ed-b17e-005056b238ae
2022-07-11 21:12:36,161 - distributed.core - INFO - Starting established connection
2022-07-11 21:12:36,162 - distributed.scheduler - INFO - Remove client Client-2aec3a60-015e-11ed-b17e-005056b238ae
2022-07-11 21:12:36,163 - distributed.core - INFO - Event loop was unresponsive in Scheduler for 6.51s.  This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
2022-07-11 21:12:36,165 - distributed._signals - INFO - Received signal SIGINT (2)
2022-07-11 21:12:36,165 - distributed.scheduler - INFO - Close client connection: Client-2aec3a60-015e-11ed-b17e-005056b238ae
2022-07-11 21:12:36,166 - distributed.scheduler - INFO - Scheduler closing...
2022-07-11 21:12:36,166 - distributed.scheduler - INFO - Scheduler closing all comms
2022-07-11 21:12:36,168 - distributed.scheduler - INFO - Stopped scheduler at 'tcp://10.212.20.218:49565'
2022-07-11 21:12:36,168 - distributed.scheduler - INFO - End scheduler
=================================== FAILURES ===================================
___________________________ test_scheduler_port_zero ___________________________
self = <TCP (closed) Client->Scheduler local=tcp://10.212.20.218:49566 remote=tcp://10.212.20.218:49565>
deserializers = None
asyncdefread(self, deserializers=None):
        stream = self.stream
if stream isNone:
raise CommClosedError()
        fmt = "Q"
        fmt_size = struct.calcsize(fmt)
try:
>           frames_nbytes = await stream.read_bytes(fmt_size)
E           asyncio.exceptions.CancelledError
distributed/comm/tcp.py:223: CancelledError
During handling of the above exception, another exception occurred:
fut = <Task cancelled name='Task-236' coro=<TCP.read() done, defined at /Users/runner/work/distributed/distributed/distributed/comm/tcp.py:214>>
timeout = 5
asyncdefwait_for(fut, timeout):
"""Wait for the single Future or coroutine to complete, with timeout.
    Coroutine will be wrapped in Task.
    Returns result of the Future or coroutine.  When a timeout occurs,
    it cancels the task and raises TimeoutError.  To avoid the task
    cancellation, wrap it in shield().
    If the wait is cancelled, the task is also cancelled.
    This function is a coroutine.
    """
        loop = events.get_running_loop()
if timeout isNone:
returnawait fut
if timeout <= 0:
            fut = ensure_future(fut, loop=loop)
if fut.done():
return fut.result()
await _cancel_and_wait(fut, loop=loop)
try:
return fut.result()
except exceptions.CancelledError as exc:
raise exceptions.TimeoutError() fromexc
        waiter = loop.create_future()
        timeout_handle = loop.call_later(timeout, _release_waiter, waiter)
        cb = functools.partial(_release_waiter, waiter)
        fut = ensure_future(fut, loop=loop)
        fut.add_done_callback(cb)
try:
# wait until the future completes or the timeout
try:
await waiter
except exceptions.CancelledError:
if fut.done():
return fut.result()
else:
                    fut.remove_done_callback(cb)
# We must ensure that the task is not running
# after wait_for() returns.
# See https://bugs.python.org/issue32751
await _cancel_and_wait(fut, loop=loop)
raise
if fut.done():
return fut.result()
else:
                fut.remove_done_callback(cb)
# We must ensure that the task is not running
# after wait_for() returns.
# See https://bugs.python.org/issue32751
await _cancel_and_wait(fut, loop=loop)
# In case task cancellation failed with some
# exception, we should re-raise it
# See https://bugs.python.org/issue40607
try:
>                   return fut.result()
E                   asyncio.exceptions.CancelledError
../../../miniconda3/envs/dask-distributed/lib/python3.10/asyncio/tasks.py:456: CancelledError
The above exception was the direct cause of the following exception:
loop = <tornado.platform.asyncio.AsyncIOLoop object at 0x1387b06d0>
deftest_scheduler_port_zero(loop):
with tmpfile() as fn:
with popen(
                ["dask-scheduler", "--no-dashboard", "--scheduler-file", fn, "--port", "0"]
            ):
>               with Client(scheduler_file=fn, loop=loop) as c:
distributed/cli/tests/test_dask_scheduler.py:255: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
distributed/client.py:940: in __init__
self.start(timeout=timeout)
distributed/client.py:1098: in start
    sync(self.loop, self._start, **kwargs)
distributed/utils.py:405: in sync
raise exc.with_traceback(tb)
distributed/utils.py:378: in f
    result = yield future
../../../miniconda3/envs/dask-distributed/lib/python3.10/site-packages/tornado/gen.py:762: in run
    value = future.result()
distributed/client.py:1178: in _start
awaitself._ensure_connected(timeout=timeout)
distributed/client.py:1265: in _ensure_connected
    msg = await asyncio.wait_for(comm.read(), timeout)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
fut = <Task cancelled name='Task-236' coro=<TCP.read() done, defined at /Users/runner/work/distributed/distributed/distributed/comm/tcp.py:214>>
timeout = 5
asyncdefwait_for(fut, timeout):
"""Wait for the single Future or coroutine to complete, with timeout.
    Coroutine will be wrapped in Task.
    Returns result of the Future or coroutine.  When a timeout occurs,
    it cancels the task and raises TimeoutError.  To avoid the task
    cancellation, wrap it in shield().
    If the wait is cancelled, the task is also cancelled.
    This function is a coroutine.
    """
        loop = events.get_running_loop()
if timeout isNone:
returnawait fut
if timeout <= 0:
            fut = ensure_future(fut, loop=loop)
if fut.done():
return fut.result()
await _cancel_and_wait(fut, loop=loop)
try:
return fut.result()
except exceptions.CancelledError as exc:
raise exceptions.TimeoutError() fromexc
        waiter = loop.create_future()
        timeout_handle = loop.call_later(timeout, _release_waiter, waiter)
        cb = functools.partial(_release_waiter, waiter)
        fut = ensure_future(fut, loop=loop)
        fut.add_done_callback(cb)
try:
# wait until the future completes or the timeout
try:
await waiter
except exceptions.CancelledError:
if fut.done():
return fut.result()
else:
                    fut.remove_done_callback(cb)
# We must ensure that the task is not running
# after wait_for() returns.
# See https://bugs.python.org/issue32751
await _cancel_and_wait(fut, loop=loop)
raise
if fut.done():
return fut.result()
else:
                fut.remove_done_callback(cb)
# We must ensure that the task is not running
# after wait_for() returns.
# See https://bugs.python.org/issue32751
await _cancel_and_wait(fut, loop=loop)
# In case task cancellation failed with some
# exception, we should re-raise it
# See https://bugs.python.org/issue40607
try:
return fut.result()
except exceptions.CancelledError as exc:
>                   raise exceptions.TimeoutError() fromexc
E                   asyncio.exceptions.TimeoutError
../../../miniconda3/envs/dask-distributed/lib/python3.10/asyncio/tasks.py:458: TimeoutError
----------------------------- Captured stderr call -----------------------------
[2022](https://github.com/dask/distributed/runs/7289414459?check_suite_focus=true#step:11:2023)-07-11 21:12:29,544 - distributed.scheduler - INFO - -----------------------------------------------
2022-07-11 21:12:29,595 - distributed.http.proxy - INFO - To route to workers diagnostics web server please install jupyter-server-proxy: python -m pip install jupyter-server-proxy
2022-07-11 21:12:29,606 - distributed.scheduler - INFO - State start
2022-07-11 21:12:29,611 - distributed.scheduler - INFO - -----------------------------------------------
2022-07-11 21:12:29,612 - distributed.scheduler - INFO - Clear task state
2022-07-11 21:12:29,613 - distributed.scheduler - INFO -   Scheduler at: tcp://10.212.20.218:49565
2022-07-11 21:12:29,613 - distributed.scheduler - INFO -   dashboard at:                     :8787
2022-07-11 21:12:29,666 - distributed.scheduler - INFO - Receive client connection: Client-2aec3a60-015e-11ed-b17e-005056b238ae
2022-07-11 21:12:36,161 - distributed.core - INFO - Starting established connection
2022-07-11 21:12:36,162 - distributed.scheduler - INFO - Remove client Client-2aec3a60-015e-11ed-b17e-005056b238ae
2022-07-11 21:12:36,163 - distributed.core - INFO - Event loop was unresponsive in Scheduler for 6.51s.  This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
2022-07-11 21:12:36,165 - distributed._signals - INFO - Received signal SIGINT (2)
2022-07-11 21:12:36,165 - distributed.scheduler - INFO - Close client connection: Client-2aec3a60-015e-11ed-b17e-005056b238ae
2022-07-11 21:12:36,166 - distributed.scheduler - INFO - Scheduler closing...
2022-07-11 21:12:36,166 - distributed.scheduler - INFO - Scheduler closing all comms
2022-07-11 21:12:36,168 - distributed.scheduler - INFO - Stopped scheduler at 'tcp://10.212.20.218:49565'
2022-07-11 21:12:36,168 - distributed.scheduler - INFO - End scheduler

https://github.com/dask/distributed/runs/7289414459?check_suite_focus=true#step:11:2128

The text was updated successfully, but these errors were encountered:

gjoseph92 added the flaky test Intermittent failures on CI. label Jul 20, 2022

gjoseph92 mentioned this issue Jul 20, 2022

Flaky tests: plain TimeoutError during client connection to scheduler #6757

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flaky `test_scheduler_port_zero` #6758

Flaky `test_scheduler_port_zero` #6758

gjoseph92 commented Jul 20, 2022 •

edited

Loading

Flaky test_scheduler_port_zero #6758

Flaky test_scheduler_port_zero #6758

Comments

gjoseph92 commented Jul 20, 2022 • edited Loading

Flaky `test_scheduler_port_zero` #6758

Flaky `test_scheduler_port_zero` #6758

gjoseph92 commented Jul 20, 2022 •

edited

Loading