-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
TLDR: I think we want A.sel(x=A.x).equals(A)
to pass lazily. It doesn't do so currently.
Currently if I do A.sel(x=A.x)
, this sticks in a getitem
call in the dask graph which breaks our lazy array equality optimization. Here's an example
>>> A = xr.DataArray(np.arange(100), dims="x", coords={"x": np.arange(100)}).chunk({"x": 1})
>>> A.sel(x=A.x).variable.equals(A, equiv=xr.core.duck_array_ops.lazy_array_equiv)
None
Questions:
-
Where is the best place to do this? In
sel
orisel
? Both?
Sticking the following insel
makes the above check returnTrue
which is what we want:if self._indexes: equals = [] for index in indexers: equals.append(indexers[index].to_index().equals(self._indexes[index])) if all(equals): return self
This doesn't handle slice objects though so that makes me think we'd want to add something similar to
isel
too. -
What is the behaviour we want?
A.sel(x=A.x).equals(A)
orA.sel(x=A.x) is A
? -
Doing the latter will mean changing
_to_temp_dataset
and_from_temp_dataset
which suggests the constraintA._from_temp_dataset(A._to_temp_dataset()) is A
? But this seems too strong to me. Do we only want to lazily satisfy anequals
constraint rather than anidentical
constraint? -
It seems like we'll want to add such short-circuits in many places (I have not checked all of these):
sortby
,broadcast
,align
,reindex
(transpose
does this now).