-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New function for applying vectorized functions for unlabeled arrays to xarray objects #964
Conversation
coords = merge_coords_without_align(coord_variables) | ||
name = result_name(args) | ||
|
||
data_vars = [getattr(a, 'variable') for a in args] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here & variables
above could be normal a.variable
rather than getattr...
?
This looks awesome! Would simplify a lot of the existing op stuff! |
if not set(valid_core_dims_for_axis) <= axis_dims: | ||
raise ValueError('axis dimensions %r have overlap with core ' | ||
'dimensions %r, but do not appear at the start' | ||
% (axis_dims, core_dims)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the order of the passed dims matter? (i.e. why not transpose them all into the order that's needed)
When this is done & we can do da[bool_array] = 5 ...could be sugar for... da.where(bool_array, 5) i.e. do we get multidimensional indexing for free? |
@MaximilianR Two issues come to mind with remapping
|
Thanks for thinking through these
I think that makes sense.
The way I was thinking about it: both assert set(other.dims) =< set(da.dims)
assert set(bool_array.dims) =< set(da.dims)
other, _ = xr.broadcast(other, da)
bool_array, _ = xr.broadcast(bool_array, da)
da.where(bool_array, other) Is that consistent with the joins you were thinking of? |
This is now tested and ready for review. The API could particularly use feedback -- please take a look at the docstring and examples in the first comment. Long desired operations, like a fill value for I have not yet hooked this up to the rest of xarray's code base, both because the set of changes we will be able to do with this are quite large, and because I'd like to give other contributors a chance to help/test. Note that the general version of Finally, given the generality of this operation, I'm considering renaming it from |
# type: List[object] -> Any | ||
# use the same naming heuristics as pandas: | ||
# https://github.com/blaze/blaze/issues/458#issuecomment-51936356 | ||
names = set(getattr(obj, 'name', None) for obj in objects) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW this could be a set comprehension
Would it be possible to write something like np.einsum with xarray named dimensions? I think it's possible, by supplying the dimensions to sum over, and broadcasting the others. Similar to the |
return tuple(x) | ||
|
||
|
||
class Signature(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already a Signature
class in Python3 FYI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to UFuncSignature
to minimize potential confusion.
One of the tricky things with I'd appreciate feedback on which cases are most essential and which can wait until later (this PR is already getting pretty big). Also, I'd appreciate ideas for how to make the API more easily understood. We will have extensive docs either way, but How
|
Any hope to get dask support? Even with the limitation of having 1:1 matching between input and output chunks, it would already be tremendously useful In other words, it should be easy to automatically call dask.array.map_blocks |
I worked around the limitation. It would be nice if apply() did the below automatically!
|
I'm thinking about making a few tweaks and merging this, but not exposing it to users yet as part of public API. The public API is not quite there yet, but even as it I think it would be a useful building point for internal functionality (e.g., for #1065), and then other people could start to build on this as well. |
I'm thinking through how difficult it would be to add back-fill method to Would this PR help? I'm trying to wrap my head around the options. Thanks |
Yes, quite likely. In the current state, it would depend on if you want to back-fill all variables or just data variables (only the later is currently supported). Either way, the first step is probably to write a function |
Right. Let me know if I'm missing (pun intended - long day) something. Is there a library of these sorts of functions over n-dims somewhere else (even R / Julia)? Or are we really the first people in the world to be doing this? |
Usually I check |
Gave this a quick spin for filling. A few questions:
da=xr.DataArray(np.random.rand(10,3), dims=('x','y'))
da = da.where(da>0.5)
In [43]: da
Out[43]:
<xarray.DataArray (x: 10, y: 3)>
array([[ nan, 0.57243305, 0.84363016],
[ nan, 0.90788156, nan],
[ nan, 0.50739189, 0.93701278],
[ nan, nan, 0.86804167],
[ nan, 0.50883914, nan],
[ nan, nan, nan],
[ nan, 0.91547763, nan],
[ 0.72920182, nan, 0.6982745 ],
[ 0.73033449, 0.950719 , 0.73077113],
[ nan, nan, 0.72463932]])
In [44]: xr.apply(bn.push, da) . # already better than `bn.push(da)`!
Out[44]:
<xarray.DataArray (x: 10, y: 3)>
array([[ nan, 0.57243305, 0.84363016],
[ nan, 0.90788156, 0.90788156],
[ nan, 0.50739189, 0.93701278],
[ nan, nan, 0.86804167],
[ nan, 0.50883914, 0.50883914],
[ nan, nan, nan],
[ nan, 0.91547763, 0.91547763],
[ 0.72920182, 0.72920182, 0.6982745 ],
[ 0.73033449, 0.950719 , 0.73077113],
[ nan, nan, 0.72463932]])
# but changing the axis is verbose and transposes the array - are there existing tools for this?
In [48]: xr.apply(bn.push, da, signature='(x)->(x)', new_coords=[dict(x=da.x)])
Out[48]:
<xarray.DataArray (y: 3, x: 10)>
array([[ nan, nan, nan, nan, nan,
nan, nan, 0.72920182, 0.73033449, 0.73033449],
[ 0.57243305, 0.90788156, 0.50739189, 0.50739189, 0.50883914,
0.50883914, 0.91547763, 0.91547763, 0.950719 , 0.950719 ],
[ 0.84363016, 0.84363016, 0.93701278, 0.86804167, 0.86804167,
0.86804167, 0.86804167, 0.6982745 , 0.73077113, 0.72463932]])
Coordinates:
* x (x) int64 0 1 2 3 4 5 6 7 8 9
o y (y) -
|
@shoyer - do we want to get this into 0.9 as a private api function and aim to complete it for the public api by 0.10 or so? |
@shoyer - any plans to add dask support as suggested above? |
Yes, in fact I have a branch with some basic support for this that I was working on a few months ago. I haven't written tests yet but I can potentially push that WIP to another PR after merging this. There are a couple of recent feature additions to
Yes, this seems like a good goal. I'll take another look over this next week when I have the chance, to remove any work-in-progress bits that have snuck in and remove the public facing API. |
I removed the public facing API and renamed the (now private) apply function back to As discussed above, the current API with |
OK, in it goes. Once again, there's no public API exposed yet. |
Congrats! |
FWIW the |
#1245 replaces the unintuitive |
This PR creates new public facing function
xarray.apply_ufunc
which handles all the logic of applying numpy generalized universal functions to xarray's labelled arrays, including automatic alignment, merging coordinates, broadcasting and reapplying labels to the result.Note that although we use the gufunc interface here, this works for far more than gufuncs. Any function that handles broadcasting in the usual numpy way will do. See below for examples.
Now that this logic is all in one place, we will even be able to (in a follow-up PR) include hooks for setting output array names and attributes based on input (e.g., to allow third party libraries to add unit support #525).
Xref #770
Examples
Calculate the vector magnitude of two arguments:
Compute the mean (
.mean
)::Inner product over a specific dimension::
Stack objects along a new dimension (like
xr.concat
)::Singular value decomposition:
Signature/Docstring