Closed
Description
distributed/shuffle/tests/test_shuffle.py::test_bad_disk
has started failing on main
with the traceback below. See this CI run for an example.
________________________________ test_bad_disk _________________________________
1 thread(s) were leaked from test
------ Call stack of leaked thread 1/1: <Thread(ThreadPoolExecutor-69_0, started 140170015799040)> ------
File "/usr/share/miniconda3/envs/dask-distributed/lib/python3.9/threading.py", line 937, in _bootstrap
self._bootstrap_inner()
File "/usr/share/miniconda3/envs/dask-distributed/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/usr/share/miniconda3/envs/dask-distributed/lib/python3.9/threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "/usr/share/miniconda3/envs/dask-distributed/lib/python3.9/concurrent/futures/thread.py", line 81, in _worker
work_item = work_queue.get(block=True)
----------------------------- Captured stderr call -----------------------------
2022-10-27 14:34:53,950 - distributed.worker - WARNING - Compute Failed
Key: ('shuffle-p2p-4651c82ee05b6682b3f73b2b18ad74e5', 7)
Function: shuffle_unpack
args: ('3110a8a90a5b642409b0a20f83b03722', 7, None)
kwargs: {}
Exception: "FileNotFoundError(2, 'No such file or directory')"
2022-10-27 14:34:53,951 - distributed.worker - WARNING - Compute Failed
Key: ('shuffle-p2p-4651c82ee05b6682b3f73b2b18ad74e5', 1)
Function: shuffle_unpack
args: ('3110a8a90a5b642409b0a20f83b03722', 1, None)
kwargs: {}
Exception: "FileNotFoundError(2, 'No such file or directory')"
2022-10-27 14:34:53,955 - distributed.worker - WARNING - Compute Failed
Key: ('shuffle-p2p-4651c82ee05b6682b3f73b2b18ad74e5', 0)
Function: shuffle_unpack
args: ('3110a8a90a5b642409b0a20f83b03722', 0, None)
kwargs: {}
Exception: "FileNotFoundError(2, 'No such file or directory')"
2022-10-27 14:34:53,987 - distributed.worker - WARNING - Compute Failed
Key: ('shuffle-p2p-4651c82ee05b6682b3f73b2b18ad74e5', 2)
Function: shuffle_unpack
args: ('3110a8a90a5b642409b0a20f83b03722', 2, None)
kwargs: {}
Exception: "FileNotFoundError(2, 'No such file or directory')"
2022-10-27 14:34:53,987 - distributed.worker - WARNING - Compute Failed
Key: ('shuffle-p2p-4651c82ee05b6682b3f73b2b18ad74e5', 5)
Function: shuffle_unpack
args: ('3110a8a90a5b642409b0a20f83b03722', 5, None)
kwargs: {}
Exception: "FileNotFoundError(2, 'No such file or directory')"
2022-10-27 14:34:53,987 - distributed.worker - WARNING - Compute Failed
Key: ('shuffle-p2p-4651c82ee05b6682b3f73b2b18ad74e5', 3)
Function: shuffle_unpack
args: ('3110a8a90a5b642409b0a20f83b03722', 3, None)
kwargs: {}
Exception: "FileNotFoundError(2, 'No such file or directory')"
2022-10-27 14:34:53,990 - distributed.worker - ERROR - Exception during execution of task ('shuffle-p2p-4651c82ee05b6682b3f73b2b18ad74e5', 4).
Traceback (most recent call last):
File "/home/runner/work/distributed/distributed/distributed/worker.py", line 2341, in _prepare_args_for_execution
data[k] = self.data[k]
File "/usr/share/miniconda3/envs/dask-distributed/lib/python3.9/site-packages/zict/buffer.py", line 108, in __getitem__
raise KeyError(key)
KeyError: 'shuffle-barrier-3110a8a90a5b642409b0a20f83b03722'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/runner/work/distributed/distributed/distributed/worker.py", line 2239, in execute
args2, kwargs2 = self._prepare_args_for_execution(ts, args, kwargs)
File "/home/runner/work/distributed/distributed/distributed/worker.py", line 2345, in _prepare_args_for_execution
data[k] = Actor(type(self.state.actors[k]), self.address, k, self)
KeyError: 'shuffle-barrier-3110a8a90a5b642409b0a20f83b03722'
2022-10-27 14:34:53,996 - distributed.diskutils - ERROR - Failed to remove '/tmp/dask-worker-space/worker-dt9g6wgx' (failed in <built-in function lstat>): [Errno 2] No such file or directory: '/tmp/dask-worker-space/worker-dt9g6wgx'
2022-10-27 14:34:53,996 - distributed.diskutils - ERROR - Failed to remove '/tmp/dask-worker-space/worker-ihvksve8' (failed in <built-in function lstat>): [Errno 2] No such file or directory: '/tmp/dask-worker-space/worker-ihvksve8'
------------------------------ Captured log call -------------------------------
ERROR asyncio:base_events.py:1753 Task exception was never retrieved
future: <Task finished name='Task-65302' coro=<Shuffle.receive() done, defined at /home/runner/work/distributed/distributed/distributed/shuffle/_shuffle_extension.py:142> exception=FileNotFoundError(2, 'No such file or directory')>
Traceback (most recent call last):
File "/home/runner/work/distributed/distributed/distributed/shuffle/_shuffle_extension.py", line 148, in receive
raise self._exception
File "/home/runner/work/distributed/distributed/distributed/shuffle/_shuffle_extension.py", line 172, in receive
await self.multi_file.put(groups)
File "/home/runner/work/distributed/distributed/distributed/shuffle/_multi_file.py", line 124, in put
raise self._exception
File "/home/runner/work/distributed/distributed/distributed/shuffle/_multi_file.py", line 202, in process
with open(
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/dask-worker-space/worker-ihvksve8/shuffle-3110a8a90a5b642409b0a20f83b03722/1'
- generated xml file: /home/runner/work/distributed/distributed/reports/pytest.xml -
cc @fjetter as I know you've made some shuffle-related changes recently (not sure if they're related though)