Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_restarting_does_not_deadlock flaky due to import error on deserialization #8759

Open
fjetter opened this issue Jul 11, 2024 · 0 comments
Labels
flaky test Intermittent failures on CI. regression

Comments

@fjetter
Copy link
Member

fjetter commented Jul 11, 2024

Example run https://github.com/dask/distributed/actions/runs/9886284934/job/27305772628

Apparently we're failing to deserialize an object which causes this test to fail every now and then. The failure message is not very descriptive, though

Traceback on CI
2024-07-11 06:14:38,059 - distributed.shuffle._scheduler_plugin - WARNING - Shuffle ef8af1c8020cbdda5f63d080f5c59afc initialized by task ('shuffle-transfer-ef8af1c8020cbdda5f63d080f5c59afc', 9) executed on worker tcp://127.0.0.1:46279
2024-07-11 06:14:38,900 - distributed.protocol.pickle - INFO - Failed to deserialize b'\x80\x05\x95v\x04\x00\x00\x00\x00\x00\x00\x8c\x12dask_expr._shuffle\x94\x8c!AssignPartitioningIndex.operation\x94\x93\x94(\x8c0to_string_dtype-7ac6668f53d83fc03a572fcc298585c2\x94K#\x86\x94]\x94\x8c\x01x\x94a\x8c\x0b_partitions\x94K<\x8c\x11pandas.core.frame\x94\x8c\tDataFrame\x94\x93\x94)\x81\x94}\x94(\x8c\x04_mgr\x94\x8c\x1epandas.core.internals.managers\x94\x8c\x0cBlockManager\x94\x93\x94\x8c\x16pandas._libs.internals\x94\x8c\x0f_unpickle_block\x94\x93\x94\x8c\x13numpy._core.numeric\x94\x8c\x0b_frombuffer\x94\x93\x94(\x97\x8c\x05numpy\x94\x8c\x05dtype\x94\x93\x94\x8c\x02f8\x94\x89\x88\x87\x94R\x94(K\x03\x8c\x01<\x94NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00t\x94bK\x02K\x00\x86\x94\x8c\x01C\x94t\x94R\x94\x8c\x08builtins\x94\x8c\x05slice\x94\x93\x94K\x00K\x02K\x01\x87\x94R\x94K\x02\x87\x94R\x94\x85\x94]\x94(\x8c\x18pandas.core.indexes.base\x94\x8c\n_new_Index\x94\x93\x94h,\x8c\x05Index\x94\x93\x94}\x94(\x8c\x04data\x94\x8c\x16numpy._core.multiarray\x94\x8c\x0c_reconstruct\x94\x93\x94h\x17\x8c\x07ndarray\x94\x93\x94K\x00\x85\x94C\x01b\x94\x87\x94R\x94(K\x01K\x02\x85\x94h\x19\x8c\x02O8\x94\x89\x88\x87\x94R\x94(K\x03\x8c\x01|\x94NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK?t\x94b\x89]\x94(h\x06\x8c\x01y\x94et\x94b\x8c\x04name\x94Nu\x86\x94R\x94\x8c\x1dpandas.core.indexes.datetimes\x94\x8c\x12_new_DatetimeIndex\x94\x93\x94hH\x8c\rDatetimeIndex\x94\x93\x94}\x94(h2\x8c\x13pandas._libs.arrays\x94\x8c\x1c__pyx_unpickle_NDArrayBacked\x94\x93\x94\x8c\x1cpandas.core.arrays.datetimes\x94\x8c\rDatetimeArray\x94\x93\x94J\x1f\x06\xf1\x04N\x87\x94R\x94h\x19\x8c\x02M8\x94\x89\x88\x87\x94R\x94(K\x04h\x1dNNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00}\x94(C\x02ns\x94K\x01K\x01K\x01t\x94\x86\x94t\x94bh5h7K\x00\x85\x94h9\x87\x94R\x94(K\x01K\x00\x85\x94h\x19\x8c\x02M8\x94\x89\x88\x87\x94R\x94(K\x04h\x1dNNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00}\x94(C\x02ns\x94K\x01K\x01K\x01t\x94\x86\x94t\x94b\x89C\x00\x94t\x94b}\x94\x8c\x05_freq\x94\x8c\x1bpandas._libs.tslibs.offsets\x94\x8c\x06Second\x94\x93\x94K\n\x89\x86\x94R\x94s\x87\x94bhE\x8c\ttimestamp\x94u\x86\x94R\x94e\x86\x94R\x94\x8c\x04_typ\x94\x8c\tdataframe\x94\x8c\t_metadata\x94]\x94\x8c\x05attrs\x94}\x94\x8c\x06_flags\x94}\x94\x8c\x17allows_duplicate_labels\x94\x88sub\x89t\x94}\x94\x87\x94.'
Traceback (most recent call last):
  File "/home/runner/work/distributed/distributed/distributed/protocol/pickle.py", line 94, in loads
    return pickle.loads(x, buffers=buffers)
  File "/home/runner/miniconda3/envs/dask-distributed/lib/python3.9/site-packages/dask_expr/__init__.py", line 3, in <module>
    from dask_expr import _version, datasets
  File "/home/runner/miniconda3/envs/dask-distributed/lib/python3.9/site-packages/dask_expr/datasets.py", line 8, in <module>
    from dask_expr._collection import new_collection
  File "/home/runner/miniconda3/envs/dask-distributed/lib/python3.9/site-packages/dask_expr/_collection.py", line 13, in <module>
    import dask.dataframe.methods as methods
ImportError: cannot import name 'dataframe' from 'dask' (/home/runner/miniconda3/envs/dask-distributed/lib/python3.9/site-packages/dask/__init__.py)
2024-07-11 06:14:38,905 - distributed.protocol.core - CRITICAL - Failed to deserialize
Traceback (most recent call last):
  File "/home/runner/work/distributed/distributed/distributed/protocol/core.py", line 175, in loads
    return msgpack.loads(
  File "msgpack/_unpacker.pyx", line 194, in msgpack._cmsgpack.unpackb
  File "/home/runner/work/distributed/distributed/distributed/protocol/core.py", line 172, in _decode_default
    return pickle.loads(sub_header["pickled-obj"], buffers=sub_frames)
  File "/home/runner/work/distributed/distributed/distributed/protocol/pickle.py", line 94, in loads
    return pickle.loads(x, buffers=buffers)
  File "/home/runner/miniconda3/envs/dask-distributed/lib/python3.9/site-packages/dask_expr/__init__.py", line 3, in <module>
    from dask_expr import _version, datasets
  File "/home/runner/miniconda3/envs/dask-distributed/lib/python3.9/site-packages/dask_expr/datasets.py", line 8, in <module>
    from dask_expr._collection import new_collection
  File "/home/runner/miniconda3/envs/dask-distributed/lib/python3.9/site-packages/dask_expr/_collection.py", line 13, in <module>
    import dask.dataframe.methods as methods
ImportError: cannot import name 'dataframe' from 'dask' (/home/runner/miniconda3/envs/dask-distributed/lib/python3.9/site-packages/dask/__init__.py)

What's interesting is that I can reproduce this locally with a different, slightly more interesting traceback

2024-07-11 12:39:54,828 - distributed.core - INFO - Starting established connection to tcp://127.0.0.1:57003
2024-07-11 12:39:54,945 - distributed.shuffle._scheduler_plugin - WARNING - Shuffle c86d91ee032fa0fad7572f33c5a71522 initialized by task ('shuffle-transfer-c86d91ee032fa0fad7572f33c5a71522', 24) executed on worker tcp://127.0.0.1:57010
/Users/fjetter/workspace/dask/dask/dataframe/__init__.py:42: FutureWarning:
Dask dataframe query planning is disabled because dask-expr is not installed.

You can install it with `pip install dask[dataframe]` or `conda install dask`.
This will raise in a future version.

  warnings.warn(msg, FutureWarning)
2024-07-11 12:39:55,342 - distributed.protocol.pickle - INFO - Failed to deserialize b"\x80\x05\x95v\x04\x00\x00\x00\x00\x00\x00\x8c\x12dask_expr._shuffle\x94\x8c!AssignPartitioningIndex.operation\x94\x93\x94(\x8c0to_string_dtype-6905eb7babca4a00cd477a9aeb2e90fe\x94K'\x86\x94]\x94\x8c\x01x\x94a\x8c\x0b_partitions\x94K<\x8c\x11pandas.core.frame\x94\x8c\tDataFrame\x94\x93\x94)\x81\x94}\x94(\x8c\x04_mgr\x94\x8c\x1epandas.core.internals.managers\x94\x8c\x0cBlockManager\x94\x93\x94\x8c\x16pandas._libs.internals\x94\x8c\x0f_unpickle_block\x94\x93\x94\x8c\x13numpy._core.numeric\x94\x8c\x0b_frombuffer\x94\x93\x94(\x97\x8c\x05numpy\x94\x8c\x05dtype\x94\x93\x94\x8c\x02f8\x94\x89\x88\x87\x94R\x94(K\x03\x8c\x01<\x94NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00t\x94bK\x02K\x00\x86\x94\x8c\x01C\x94t\x94R\x94\x8c\x08builtins\x94\x8c\x05slice\x94\x93\x94K\x00K\x02K\x01\x87\x94R\x94K\x02\x87\x94R\x94\x85\x94]\x94(\x8c\x18pandas.core.indexes.base\x94\x8c\n_new_Index\x94\x93\x94h,\x8c\x05Index\x94\x93\x94}\x94(\x8c\x04data\x94\x8c\x16numpy._core.multiarray\x94\x8c\x0c_reconstruct\x94\x93\x94h\x17\x8c\x07ndarray\x94\x93\x94K\x00\x85\x94C\x01b\x94\x87\x94R\x94(K\x01K\x02\x85\x94h\x19\x8c\x02O8\x94\x89\x88\x87\x94R\x94(K\x03\x8c\x01|\x94NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK?t\x94b\x89]\x94(h\x06\x8c\x01y\x94et\x94b\x8c\x04name\x94Nu\x86\x94R\x94\x8c\x1dpandas.core.indexes.datetimes\x94\x8c\x12_new_DatetimeIndex\x94\x93\x94hH\x8c\rDatetimeIndex\x94\x93\x94}\x94(h2\x8c\x13pandas._libs.arrays\x94\x8c\x1c__pyx_unpickle_NDArrayBacked\x94\x93\x94\x8c\x1cpandas.core.arrays.datetimes\x94\x8c\rDatetimeArray\x94\x93\x94J\x1f\x06\xf1\x04N\x87\x94R\x94h\x19\x8c\x02M8\x94\x89\x88\x87\x94R\x94(K\x04h\x1dNNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00}\x94(C\x02ns\x94K\x01K\x01K\x01t\x94\x86\x94t\x94bh5h7K\x00\x85\x94h9\x87\x94R\x94(K\x01K\x00\x85\x94h\x19\x8c\x02M8\x94\x89\x88\x87\x94R\x94(K\x04h\x1dNNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00}\x94(C\x02ns\x94K\x01K\x01K\x01t\x94\x86\x94t\x94b\x89C\x00\x94t\x94b}\x94\x8c\x05_freq\x94\x8c\x1bpandas._libs.tslibs.offsets\x94\x8c\x06Second\x94\x93\x94K\n\x89\x86\x94R\x94s\x87\x94bhE\x8c\ttimestamp\x94u\x86\x94R\x94e\x86\x94R\x94\x8c\x04_typ\x94\x8c\tdataframe\x94\x8c\t_metadata\x94]\x94\x8c\x05attrs\x94}\x94\x8c\x06_flags\x94}\x94\x8c\x17allows_duplicate_labels\x94\x88sub\x89t\x94}\x94\x87\x94."
Traceback (most recent call last):
  File "/Users/fjetter/workspace/distributed/distributed/protocol/pickle.py", line 94, in loads
    return pickle.loads(x, buffers=buffers)
  File "/Users/fjetter/workspace/dask-expr/dask_expr/_shuffle.py", line 66, in <module>
    from dask_expr._repartition import Repartition, RepartitionToFewer
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1009, in _find_and_load_unlocked
KeyError: 'dask_expr'
2024-07-11 12:39:55,348 - distributed.protocol.core - CRITICAL - Failed to deserialize
Traceback (most recent call last):
  File "/Users/fjetter/workspace/distributed/distributed/protocol/core.py", line 175, in loads
    return msgpack.loads(
  File "msgpack/_unpacker.pyx", line 194, in msgpack._cmsgpack.unpackb
  File "/Users/fjetter/workspace/distributed/distributed/protocol/core.py", line 172, in _decode_default
    return pickle.loads(sub_header["pickled-obj"], buffers=sub_frames)
  File "/Users/fjetter/workspace/distributed/distributed/protocol/pickle.py", line 94, in loads
    return pickle.loads(x, buffers=buffers)
  File "/Users/fjetter/workspace/dask-expr/dask_expr/_shuffle.py", line 66, in <module>
    from dask_expr._repartition import Repartition, RepartitionToFewer
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1009, in _find_and_load_unlocked
KeyError: 'dask_expr'
@fjetter fjetter added flaky test Intermittent failures on CI. regression labels Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky test Intermittent failures on CI. regression
Projects
None yet
Development

No branches or pull requests

1 participant