Skip to content

Consistent Handling of Type Casting Hierarchy #3950

Open
@jthielen

Description

@jthielen

As brought up in #3643, there appears to be some inconsistencies in how xarray handles other numeric/duck array types with regards to a well-defined type casting hierarchy across operations. For example, in the following:

Construction/Wrapping

  • Allows
    • xarray.core.indexing.ExplicitlyIndexed
    • pandas.Index
    • Dask array
    • __array_function__ implementers
  • Automatically converts
    • Anything with a values attribute to its values
    • Datetime-like array types
    • Masked arrays
    • Anything else for which np.asarray(data) is valid
  • Doesn't reject any type when trying to wrap (for an upcast type such as a HoloViews Dataset, this may be needed?)

Binary Ops

  • Defers based on xarray's internal hierarchy (Dataset, DataArray, Variable), otherwise relies upon methods of underlying data, and then wraps result.

(would be one less category to worry about if refactored to use __array_ufunc__, see #3936 (comment))

__array_ufunc__

  • Allows a list of supported types
    _HANDLED_TYPES = (
    np.ndarray,
    np.generic,
    numbers.Number,
    bytes,
    str,
    ) + dask_array_type

    along with SupportsArithmetic
  • Defers to all other types

__array_function__

One concrete example of where this has been problematic is with xarray DataArrays and Pint Quantities (#3643). xarray DataArray is above Pint Quantity in the (generally agreed upon) type casting hierarchy, and wrapping and binary ops work properly since Pint Quantities defer and xarray DataArrays handle the operation. However, ufuncs fail because they both attempt to defer to the other. Having a consistent way of handling type compatibility across all relevant areas in xarray should be able to remove these kinds of issues.

However, it would be good to keep in mind that an agreed upon way of how to do this in the broader ecosystem doesn't seem to be there yet, so this would still be treading in uncertain waters for the moment. I've been operating under these assumptions when working with Pint, but I definitely think there is a need for more authoritative guidance.

Also, if I'm mistaken in any of the things mentioned above, please do let me know!

cc @keewis, @shoyer

Metadata

Metadata

Assignees

No one assigned

    Labels

    topic-arraysrelated to flexible array support

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions