-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Closed
Closed
Copy link
Description
Describe the bug, including details regarding any error messages, version, and platform.
The fixed-shape tensor extension type does not appear to be picklable. Given that pickling Arrow data is supported in general and is used in Python-centric systems such as Ray, supporting pickling for canonical extension types/arrays seems reasonable.
Reproduction
Extension Type
pickle.loads(pickle.dumps(pa.fixed_shape_tensor(pa.int64(), (2, 2))))raises the error:
KeyError Traceback (most recent call last)
File .../venv/lib/python3.9/site-packages/pyarrow/types.pxi:4798, in pyarrow.lib.type_for_alias()
KeyError: 'extension<arrow.fixed_shape_tensor>'
Extension Array
tensor_type = pa.fixed_shape_tensor(pa.int32(), (2, 2))
arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
storage = pa.array(arr, pa.list_(pa.int32(), 4))
tensor_array = pa.ExtensionArray.from_storage(tensor_type, storage)
pickle.loads(pickle.dumps(tensor_array))raises the ~same error:
KeyError Traceback (most recent call last)
File .../venv/lib/python3.9/site-packages/pyarrow/types.pxi:4798, in pyarrow.lib.type_for_alias()
KeyError: 'extension<arrow.fixed_shape_tensor>'
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
Cell In[13], line 1
----> 1 pickle.loads(pickle.dumps(tensor_array))
File .../venv/lib/python3.9/site-packages/pyarrow/types.pxi:4800, in pyarrow.lib.type_for_alias()
ValueError: No type alias for extension<arrow.fixed_shape_tensor>
Environment
- pyarrow 12.0.0
- Python 3.9
- MacOS
Possible Solution
It seems like we might be able to implement __reduce__ on FixedShapeTensorType such that it uses the __arrow_ext_serialize__ serialization protocol? E.g.
def __reduce__(self):
return type(self).__arrow_ext_deserialize__, (self.storage, self.__arrow_ext_serialize__())Component(s)
Python