Skip to content

optimizing xarray operations for lazy array equality test #3558

@dcherian

Description

@dcherian

TLDR: I think we want A.sel(x=A.x).equals(A) to pass lazily. It doesn't do so currently.

Currently if I do A.sel(x=A.x), this sticks in a getitem call in the dask graph which breaks our lazy array equality optimization. Here's an example

>>> A = xr.DataArray(np.arange(100), dims="x", coords={"x": np.arange(100)}).chunk({"x": 1})
>>> A.sel(x=A.x).variable.equals(A, equiv=xr.core.duck_array_ops.lazy_array_equiv)
None

Questions:

  1. Where is the best place to do this? In sel or isel? Both?
    Sticking the following in sel makes the above check return True which is what we want:

        if self._indexes:
            equals = []
            for index in indexers:
                equals.append(indexers[index].to_index().equals(self._indexes[index]))
    
            if all(equals):
                return self

    This doesn't handle slice objects though so that makes me think we'd want to add something similar to isel too.

  2. What is the behaviour we want? A.sel(x=A.x).equals(A) or A.sel(x=A.x) is A?

  3. Doing the latter will mean changing _to_temp_dataset and _from_temp_dataset which suggests the constraint A._from_temp_dataset(A._to_temp_dataset()) is A? But this seems too strong to me. Do we only want to lazily satisfy an equals constraint rather than an identical constraint?

  4. It seems like we'll want to add such short-circuits in many places (I have not checked all of these): sortby, broadcast, align, reindex (transpose does this now).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions