Skip to content

[Python] Canonical fixed-shape tensor extension array/type is not picklable. #35599

@clarkzinzow

Description

@clarkzinzow

Describe the bug, including details regarding any error messages, version, and platform.

The fixed-shape tensor extension type does not appear to be picklable. Given that pickling Arrow data is supported in general and is used in Python-centric systems such as Ray, supporting pickling for canonical extension types/arrays seems reasonable.

Reproduction

Extension Type

pickle.loads(pickle.dumps(pa.fixed_shape_tensor(pa.int64(), (2, 2))))

raises the error:

KeyError                                  Traceback (most recent call last)
File .../venv/lib/python3.9/site-packages/pyarrow/types.pxi:4798, in pyarrow.lib.type_for_alias()

KeyError: 'extension<arrow.fixed_shape_tensor>'

Extension Array

tensor_type = pa.fixed_shape_tensor(pa.int32(), (2, 2))
arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
storage = pa.array(arr, pa.list_(pa.int32(), 4))
tensor_array = pa.ExtensionArray.from_storage(tensor_type, storage)
pickle.loads(pickle.dumps(tensor_array))

raises the ~same error:

KeyError                                  Traceback (most recent call last)
File .../venv/lib/python3.9/site-packages/pyarrow/types.pxi:4798, in pyarrow.lib.type_for_alias()

KeyError: 'extension<arrow.fixed_shape_tensor>'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In[13], line 1
----> 1 pickle.loads(pickle.dumps(tensor_array))

File .../venv/lib/python3.9/site-packages/pyarrow/types.pxi:4800, in pyarrow.lib.type_for_alias()

ValueError: No type alias for extension<arrow.fixed_shape_tensor>

Environment

  • pyarrow 12.0.0
  • Python 3.9
  • MacOS

Possible Solution

It seems like we might be able to implement __reduce__ on FixedShapeTensorType such that it uses the __arrow_ext_serialize__ serialization protocol? E.g.

def __reduce__(self):
    return type(self).__arrow_ext_deserialize__, (self.storage, self.__arrow_ext_serialize__())

Component(s)

Python

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions