-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] Custom Python type/array subclasses for ExtensionTypes implemented in C++ #33997
Comments
arrow/python/pyarrow/types.pxi Lines 1095 to 1100 in 45918a9
Perhaps one way to do this would be to modify |
I'm looking to subclass |
@rok I didn't get too far with this.
|
If I'm reading the discussion (in #34483 and other threads) correctly the exotic part is automatic generation of types over explicit manual definition for canonical types only? Otherwise the proposed approach would be suitable, right? |
Yes, I think it would. |
When wrapping a type (or array) in a pyarrow object, we need to define which Python class to use. Currently, for extension types, this logic lives here in
pyarrow_wrap_data_type
:arrow/python/pyarrow/public-api.pxi
Lines 114 to 120 in b413ac4
So there are currently two options:
pyarrow.(Py)ExtensionType
, and which links to the C++arrow::py::PyExtensionType
(a subclass ofarrow::ExtensionType
). In this case, we store the python type instance on the C++ instance, and return this as python object inpyarrow_wrap_data_type
.pyarrow.BaseExtenstionType
base class (there is currently a bug in this, but that is getting fixed in GH-33802).However, that means that for such extension types implemented in C++, there is currently no way to have a "richer" python Type object (or Array object, since that is determined by the Type, and for a BaseExtensionType, that will always use the base ExtensionArray). While for an extension type, you might want to add type-specific attributes or methods.
For canonical extension types that are implemented in Arrow C++ itself (for example, the currently discussed Tensor extension type in #8510, or a previous effort to add complex type as extension type in #10565), I think it will work today to create a custom subclass of
pyarrow.BaseExtensionType
for the specific canonical type, and then we could add a special case topyarrow_wrap_data_type
checking the name of the extension type, and if it is a canonical one we implement ourselves, use the python subclass we implemented ourselves.But for extension types that are implemented in C++ externally (or for extension types that are implemented in Arrow C++, but for which we don't provide a custom python subclass), that doesn't work.
I am wondering to what extent we want to allow "registering" a python class that should be used when wrapping a specific C++ extension type (and to what extent this would be useful for
The text was updated successfully, but these errors were encountered: