-
-
Notifications
You must be signed in to change notification settings - Fork 717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Network transfer of objects with circular recursion hangs #8378
Comments
do you have a reproducer for this error? If the cc @crusaderky |
Reproduced. Fairly sure it's not safe_sizeof though. import distributed
client = distributed.Client(n_workers=1)
def f():
d = {}
d[0] = d
return d
fut = client.submit(f, key="x")
distributed.wait(fut)
# So far so good - infinite recursion is handled gracefully
2023-11-30 14:23:48,192 - distributed.sizeof - WARNING - Sizeof calculation failed. Defaulting to 0.95 MiB
Traceback (most recent call last):
...
RecursionError: maximum recursion depth exceeded while calling a Python object
# Task is finished successfully and output is stored on the worker
client.run(lambda dask_worker: str(dask_worker.data["x"]))
{'tcp://127.0.0.1:35311': '{0: {...}}'}
# However, network transfer hangs
fut.result() gather_dep from one worker to another also hangs. I worked on this fairly recently (#8214). Investigating. |
FWIW If #8214 is the cause, this has already been released |
Reproduced with dask=2023.9.3 msgpack=1.0.5 (before #8214). This is not a recent regression. |
Just to mention that the issue is still there after upgrading to |
Even more minimal reproducer: >>> from distributed.protocol import serialize
>>> d = {}
>>> d[0] = d
>>> serialize(d)
RecursionError: maximum recursion depth exceeded
>>> from collections import UserDict
>>> d2 = UserDict(d) # Wrap in opaque object to use plain pickle
>>> serialize(d2)
({'serializer': 'pickle', 'writeable': ()},
[b'\x80\x05\x956\x00\x00\x00\x00\x00\x00\x00\x8c\x0bcollections\x94\x8c\x08UserDict\x94\x93\x94)\x81\x94}\x94\x8c\x04data\x94}\x94K\x00}\x94K\x00h\x07sssb.']) |
I am experiencing the same issue when attempting to use |
Describe the issue:
I have Dask submitting jobs to condor. They seem to work fine and produce the output.
However they crash at the end with the following errors:
Workers crash due to exceeding recursion depth. But it seems like the problem is in
safe_sizeof()
method or in the meth:Minimal Complete Verifiable Example:
None
Environment:
The text was updated successfully, but these errors were encountered: