You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If an error occurs deserializing, regrouping, serializing, or writing data to disk, and the shuffle isn't in "backpressure mode", that data will simply be lost and the shuffle will still succeed
When task isn't awaited, the asyncio task is leaked. Any error that occurred in it is also lost (besides being logged).
@graingert and I tried to fix this, but we couldn't get something working:
The dumb way (keep leaking tasks, wrap all of receive in a try/except, set self._exception, raise self._exception in add_partition, get_output_partition, inputs_done) fails tests because of leaking tasks
The "proper" asyncio way causes a CancelledError to pop out in some unexpected place and seems to shut down the whole worker?
If an error occurs deserializing, regrouping, serializing, or writing data to disk, and the shuffle isn't in "backpressure mode", that data will simply be lost and the shuffle will still succeed
distributed/distributed/shuffle/shuffle_extension.py
Lines 252 to 258 in 7bd6442
When
task
isn't awaited, the asyncio task is leaked. Any error that occurred in it is also lost (besides being logged).@graingert and I tried to fix this, but we couldn't get something working:
receive
in a try/except, setself._exception
, raiseself._exception
inadd_partition
,get_output_partition
,inputs_done
) fails tests because of leaking tasksxref #6201
The text was updated successfully, but these errors were encountered: