-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
combine_first by using apply_ufunc in ops.fillna #1204
Conversation
…ve _fillna from injection.
def _fillna(data, other): | ||
left, right = np.broadcast_arrays(data, other) | ||
result = left.copy() # view must be copied before being written | ||
result[isnull(result)] = right[isnull(result)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An advantage of using where(isnull(data), other, data)
instead of all the logic in this function is that it works on dask arrays, too.
@@ -382,7 +382,15 @@ def fillna(self, value): | |||
Dataset.fillna | |||
DataArray.fillna | |||
""" | |||
return self._fillna(value) | |||
def _yield_applied(this, other): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think apply_ufunc
should even work on groupby objects, so you shouldn't need this helper function. I would be interested to know I'm wrong about that, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, if I replace the whole method with just return apply_ufunc(ops.fillna, self, value)
, I'd fail line 2507 of test_dataset.py
, where self.assertEqual(actual.attrs, ds.attrs)
fails, because the attrs
aren't copied over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One solution is perhaps to add something into apply_groupby_ufunc
to make sure the attrs
are copied over, similar to what's in _copy_attrs_from
in dataset.py/dataarray.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One solution is perhaps to add something into apply_groupby_ufunc to make sure the attrs are copied over, similar to what's in _copy_attrs_from in dataset.py/dataarray.py.
I like this -- it would be useful for other functions in apply_ufunc
, too. For consistency, I would call the new argument keep_attrs
.
Or if you don't want to bother with that, could just use _copy_attrs_from
in this method:
out = ops.fillna(self, other)
out._copy_attrs_from(self._obj)
return out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copped out and took your simple approach. It's mainly because I'm not sure how to deal with more than 2 objects in apply_ufunc
when keep_attrs=True
. Do I only take the first object's attributes? Maybe best left for another PR.
result = left.copy() # view must be copied before being written | ||
result[isnull(result)] = right[isnull(result)] | ||
return result | ||
return apply_ufunc(_fillna, data, other, join=join) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should have dask_array='allowed'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be helpful to separate the join
argument to apply_ufunc
into two parts:
- The
join
used for aligning indexes. - The
join
used for aligning data variables between datasets.
For fillna
, I think we would always want data_vars_join='left'
, though I guess it's equivalent to data_vars_join='left'
when we constrain value
not to contain new keys.
of ``join='outer'``. Vacant cells in the expanded coordinates are | ||
filled with np.nan. | ||
|
||
Renders the same result as xr.merge([self, other]). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not quite right -- combine_first
/fillna
are OK with conflicting values in self
and other
, but merge
will raise an error.
out.attrs = self.attrs | ||
return out | ||
|
||
def combine_first(self, other): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it's worth making a separate method here to save a few characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think either we have combine_first
for both Dataset and DataArray, or we lump everything into fillna
. The use of combine_first
is just that there's this concrete direct analogy from pandas
. What's the decision here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that data_vars_join
could sensibly differ for combine_first
(as you have implemented it) suggests that a second method is reasonable. Consistency with pandas is also a nice factor. So I guess I've come around!
'must be contained in the original dataset') | ||
out = ops.fillna(self, value, join="left") | ||
out._copy_attrs_from(self) | ||
return out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be a good time to remove all the fillna
specific logic from Dataset._calculate_binary_op
that is no longer used
@@ -1955,7 +1955,35 @@ def fillna(self, value): | |||
------- | |||
Dataset | |||
""" | |||
out = self._fillna(value) | |||
if utils.is_dict_like(value): | |||
value_keys = value.data_vars.keys() if isinstance(value, Dataset)\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe slightly cleaner to use duck typing, e.g., value_keys = getattr(value, 'data_vars', value).keys()
if not set(value_keys) <= set(self.data_vars.keys()): | ||
raise ValueError('all variables in the argument to `fillna` ' | ||
'must be contained in the original dataset') | ||
out = ops.fillna(self, value, join="left") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
join
should be a parameter for Dataset.fillna
, too.
@@ -106,6 +106,13 @@ Deprecations | |||
|
|||
Enhancements | |||
~~~~~~~~~~~~ | |||
|
|||
- Added the xarray equivalent of `pandas.Dataframe.combine_first` as an instance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a note -- this will need to go in a new section for 0.9.1, since 0.9.0 will probably be released first.
|
||
- Added the xarray equivalent of `pandas.Dataframe.combine_first` as an instance | ||
method to DataArray/Dataset objects, facilitated by the new `ops.fillna` that | ||
uses `apply_ufunc` for different `join` options. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would drop implementation details like the fact that this uses apply_ufunc
from the user facing release notes.
…through ops.fillna. Also added docstring.
So I addressed some of the comments, but not all. Please let me know what you think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is coming together nicely.
|
||
def _fillna(data, other): | ||
left, right = np.broadcast_arrays(data, other) | ||
return np.where(isnull(left), right, left) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should literally be:
def _fillna(data, other):
return where(isnull(data), other, data)
The where
and isnull
functions defined in this module already do broadcasting, and work for both dask and numpy arrays.
result_vars = apply_dict_of_variables_ufunc( | ||
func, *args, signature=signature, join=join, fill_value=fill_value) | ||
func, *args, signature=signature, join=data_vars_join, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be nice to have a unit test that explicitly covers data_vars_join
in test_computation.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the expected behavior of the following?
ds0 = xr.Dataset({'a': ('x', [1, 2]), 'x': [0, 1]})
ds1 = xr.Dataset({'b': ('x', [99, 3]), 'x': [1, 2]})
def add(a, b, join, data_vars_join, **kwargs):
return apply_ufunc(operator.add, a, b, join=join,
data_vars_join=data_vars_join, **kwargs)
out = add(ds0, ds1, 'outer', 'outer')
Currently, it raises a TypeError at result_vars[name] = func(*variable_args)
of apply_dict_of_variables_ufunc
.
On the other hand, if you run
out = add(ds0, ds2, 'outer', 'outer', dataset_fill_value=np.nan)
you get
>> out
<xarray.Dataset>
Dimensions: (x: 3)
Coordinates:
* x (x) int64 0 1 2
Data variables:
a (x) float64 nan nan nan
b (x) float64 nan nan nan
While I can understand why the code renders such result, as a user, I was hoping for something more like
<xarray.Dataset>
Dimensions: (x: 3)
Coordinates:
* x (x) int64 0 1 2
Data variables:
a (x) float64 1.0 2.0 nan
b (x) float64 nan 99.0 3.0
which would be a little more useful.
would we want to make changes for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can make that work with dataset_fill_value=0
:
In [11]: add(ds0, ds1, 'outer', 'outer', dataset_fill_value=0)
Out[11]:
<xarray.Dataset>
Dimensions: (x: 3)
Coordinates:
* x (x) int64 0 1 2
Data variables:
a (x) float64 1.0 2.0 nan
b (x) float64 nan 99.0 3.0
Are you suggesting changing the default value to 0
? It's not obvious to me what is the best choice for the default dataset_fill_value
is. Maybe we should remove the default value of None
and raise a more descriptive error if this happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of things:
1.) What do you think about removing the join
kwarg in the fillna
signature in both dataarray.py
and dataset.py
. The writing of ds0.fillna(ds1)
already implies a directionality, so adding a join
option can lead to confusion.
2.) I've decided not to touch the default value of dataset_fill_value=None
. It leads to errors in many other places once you change that. I'll look into raising some descriptive errors now that we have a data_vars_join
option. But if we want to make
out = add(ds0, ds2, 'outer', 'outer', dataset_fill_value=np.nan)
render
<xarray.Dataset>
Dimensions: (x: 3)
Coordinates:
* x (x) int64 0 1 2
Data variables:
a (x) float64 1.0 2.0 nan
b (x) float64 nan 99.0 3.0
I think we'll have to do another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1.) What do you think about removing the join kwarg in the fillna signature in both dataarray.py and dataset.py. The writing of ds0.fillna(ds1) already implies a directionality, so adding a join option can lead to confusion.
Agreed, there isn't really a need for the join
argument once we have combine_first
.
For your example add(ds0, ds2, 'outer', 'outer', dataset_fill_value=np.nan)
, can you clarify what the values of ds0
and ds2
are?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, my mistake. That ds2
is supposed to be ds1
, as the one above:
ds0 = xr.Dataset({'a': ('x', [1, 2]), 'x': [0, 1]})
ds1 = xr.Dataset({'b': ('x', [99, 3]), 'x': [1, 2]})
@@ -702,7 +713,8 @@ def stack(objects, dim, new_coord): | |||
elif any(is_dict_like(a) for a in args): | |||
return apply_dataset_ufunc(variables_ufunc, *args, signature=signature, | |||
join=join, exclude_dims=exclude_dims, | |||
fill_value=dataset_fill_value) | |||
fill_value=dataset_fill_value, | |||
data_vars_join=data_vars_join) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add this argument to the functools.partial
for GroupBy
objects in the clause above this one.
@@ -382,7 +382,15 @@ def fillna(self, value): | |||
Dataset.fillna | |||
DataArray.fillna | |||
""" | |||
return self._fillna(value) | |||
def _yield_applied(this, other): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One solution is perhaps to add something into apply_groupby_ufunc to make sure the attrs are copied over, similar to what's in _copy_attrs_from in dataset.py/dataarray.py.
I like this -- it would be useful for other functions in apply_ufunc
, too. For consistency, I would call the new argument keep_attrs
.
Or if you don't want to bother with that, could just use _copy_attrs_from
in this method:
out = ops.fillna(self, other)
out._copy_attrs_from(self._obj)
return out
out.attrs = self.attrs | ||
return out | ||
|
||
def combine_first(self, other): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that data_vars_join
could sensibly differ for combine_first
(as you have implemented it) suggests that a second method is reasonable. Consistency with pandas is also a nice factor. So I guess I've come around!
self.assertDatasetEqual(actual, expected) | ||
self.assertDatasetEqual(actual, xr.merge([dsx0, dsx1])) | ||
|
||
dsy2 = DataArray([2, 2, 2], [('x', ['b', 'c', 'd'])]).\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per PEP8, please avoid writing \
. Either split this on two lines or use parentheses for the implicit line continuation.
expected = DataArray([[0, 0], [0, 0], [2, 2]], | ||
[('x', ['a', 'b', 'd']), ('y', [-1, 0])]) | ||
self.assertDataArrayEqual(actual, expected) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: two line breaks should separate code blocks, not three
ar1.combine_first(ar0) | ||
ar0.combine_first(ar2) | ||
|
||
For datasets, ``ds0.combine_first(ds1)`` works just like ``xr.merge([ds0, ds1])`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this isn't quite true -- it succeeds in cases where merge
will raise an error. It would be nice to highlight the difference here.
@@ -2324,11 +2356,6 @@ def _calculate_binary_op(self, f, other, join='inner', | |||
inplace=False, fillna=False): | |||
|
|||
def apply_over_both(lhs_data_vars, rhs_data_vars, lhs_vars, rhs_vars): | |||
if fillna and join != 'left': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can remove the fillna
argument from the function signature entirely (and also from Dataset._binary_op
)
@@ -13,6 +13,7 @@ Combining data | |||
|
|||
* For combining datasets or data arrays along a dimension, see concatenate_. | |||
* For combining datasets with different variables, see merge_. | |||
# For combining datasets or data arrays with outer-join alignment, see combine_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe: For combining datasets with different coordinates or missing values
I would simply copy attributes from the first argument for keep_attrs. We
don't do any complex merging of attributes on any xarray methods.
…On Sat, Jan 14, 2017 at 10:04 AM chunweiyuan ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In xarray/core/groupby.py <#1204>:
> @@ -382,7 +382,15 @@ def fillna(self, value):
Dataset.fillna
DataArray.fillna
"""
- return self._fillna(value)
+ def _yield_applied(this, other):
I copped out and took your simple approach. It's mainly because I'm not
sure how to deal with more than 2 objects in apply_ufunc when
keep_attrs=True. Do I only take the first object's attributes? Maybe best
left for another PR.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1204>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABKS1gFM7BNe3Bc60Q_02T25AOWAS5GRks5rSQ4bgaJpZM4LiOvZ>
.
|
I'm not sure why continuous integration check is taking forever..... |
Appveyor is acting up. I tried cancelling and restarting the build -- we'll see if that works. But the Travis-CI tests pass, so I wouldn't worry about. |
Looks like the build is starting now. |
@@ -664,8 +676,10 @@ def stack(objects, dim, new_coord): | |||
|
|||
signature = kwargs.pop('signature', None) | |||
join = kwargs.pop('join', 'inner') | |||
data_vars_join = kwargs.pop('data_vars_join', 'inner') | |||
keep_attrs = kwargs.pop('keep_attrs', True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should default to False
, as it does for arithmetic.
join=join, exclude_dims=exclude_dims, | ||
fill_value=dataset_fill_value, | ||
data_vars_join=data_vars_join) | ||
if keep_attrs and (type(first_obj) is type(out)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move keep_attrs
one level up into apply_dataset_ufunc
and apply_dataarray_ufunc
.
exclude_dims = kwargs.pop('exclude_dims', frozenset()) | ||
dataset_fill_value = kwargs.pop('dataset_fill_value', None) | ||
dataset_fill_value = kwargs.pop('dataset_fill_value', np.nan) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would actually rather force dataset_fill_value
to be set explicitly.
One option is to create a top level constant for the default:
_DEFAULT_FILL_VALUE = object()
...
dataset_fill_value = kwargs.pop('dataset_fill_value', _DEFAULT_FILL_VALUE)
Then, in collect_dict_values
raise an error if you need to use the fill value and fill_value is _DEFAULT_FILL_VALUE
.
time to merge this guy? |
checking in on this PR... |
@@ -258,6 +270,9 @@ def join_dict_keys(objects, how='inner'): | |||
|
|||
def collect_dict_values(objects, keys, fill_value=None): | |||
# type: (Iterable[Union[Mapping, Any]], Iterable, Any) -> List[list] | |||
if fill_value is _DEFAULT_FILL_VALUE: | |||
raise ValueError('Inappropriate fill value for Dataset: {}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default fill value doesn't have an informative repr, so I wouldn't print it in the error message. Instead, say something like: "To apply an operation to datasets with different data variables, you must supply the dataset_fill_value
argument"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also: we should only raise this error if filling actually needs to be done, or better yet exactly whendata_vars_join != 'inner'
(so the function behavior does not depend on data values). In the current version, you always need to supply this argument when applying operation to datasets, which is too strict.
If we switch to raising when data_vars_join != 'inner'
, then this should be a TypeError
and be moved up into apply_dataset_ufunc
.
@@ -202,6 +206,8 @@ def apply_dataarray_ufunc(func, *args, **kwargs): | |||
signature = kwargs.pop('signature') | |||
join = kwargs.pop('join', 'inner') | |||
exclude_dims = kwargs.pop('exclude_dims', _DEFAULT_FROZEN_SET) | |||
keep_attrs = kwargs.pop('keep_attrs', False) | |||
first_obj = args[0] # we'll copy attrs from this in case keep_attrs=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor style suggestion: move this line to just about if keep_attrs
below, then you don't need the comment to explain why it exists.
@@ -89,7 +89,8 @@ def test_apply_identity(): | |||
data_array = xr.DataArray(variable, [('x', -array)]) | |||
dataset = xr.Dataset({'y': variable}, {'x': -array}) | |||
|
|||
identity = functools.partial(apply_ufunc, lambda x: x) | |||
identity = functools.partial(apply_ufunc, lambda x: x, | |||
dataset_fill_value=np.nan) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as mentioned above, we shouldn't need to set dataset_fill_value
unless data_vars_join
has been modified from the default.
exclude_dims = kwargs.pop('exclude_dims', frozenset()) | ||
dataset_fill_value = kwargs.pop('dataset_fill_value', None) | ||
dataset_fill_value = kwargs.pop('dataset_fill_value', _DEFAULT_FILL_VALUE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should rename this argument to data_vars_fill_value
, to help clarify that it is paired with data_vars_join
? I would put them next to each other in the docstring, too.
Or maybe the pair should be dataset_fill_value
/dataset_join
? dataset_data_vars_fill_value
/dataset_data_vars_join
is probably too long :).
'b': ('x', [np.nan, np.nan, np.nan]), | ||
'x': [0, 1, 2]}) | ||
assert_identical(actual, expected) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a test to verify that an appropriate error is raised (use pytest.raises
) if dataset_fill_value
is not provided when it is it necessary.
@@ -473,6 +480,37 @@ def test_broadcast_compat_data_2d(): | |||
broadcast_compat_data(var, ('w', 'y', 'x', 'z'), ())) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to add a test in here for keep_attrs
.
@@ -13,6 +13,8 @@ Combining data | |||
|
|||
* For combining datasets or data arrays along a dimension, see concatenate_. | |||
* For combining datasets with different variables, see merge_. | |||
# For combining datasets with different coordinates or missing values, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use *
at the start for consistency
Maybe say "datasets or data arrays"?
Also, maybe "different indexes" instead of "different coordinates"?
@@ -13,6 +13,8 @@ Combining data | |||
|
|||
* For combining datasets or data arrays along a dimension, see concatenate_. | |||
* For combining datasets with different variables, see merge_. | |||
# For combining datasets with different coordinates or missing values, | |||
or data arrays with outer-join alignment, see combine_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would drop "or data arrays with outer-join alignment" -- it's a little confusing.
@@ -13,8 +13,7 @@ Combining data | |||
|
|||
* For combining datasets or data arrays along a dimension, see concatenate_. | |||
* For combining datasets with different variables, see merge_. | |||
# For combining datasets with different coordinates or missing values, | |||
or data arrays with outer-join alignment, see combine_. | |||
* For combining datasets or data arrays with different indexes or missing values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you lost "see combine" here
|
||
if dataset_join != 'inner' and fill_value is _DEFAULT_FILL_VALUE: | ||
raise TypeError('To apply an operation to datasets with different ', | ||
'data variables, you must supply the ', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add if dataset_join != 'inner'
to the docstring
dataset_fill_value : optional | ||
Value used in place of missing variables on Dataset inputs when the | ||
datasets do not share the exact same ``data_vars``. Only relevant if | ||
``dataset_join != 'inner'``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe switch the last sentence to:
Required if dataset_join != 'inner', otherwise ignored.
@@ -607,23 +609,23 @@ def apply_ufunc(func, *args, **kwargs): | |||
- 'inner': use the intersection of object indexes | |||
- 'left': use indexes from the first object with each dimension | |||
- 'right': use indexes from the last object with each dimension | |||
data_vars_join : {'outer', 'inner', 'left', 'right'}, optional | |||
dataset_join : {'outer', 'inner', 'left', 'right'}, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you fix the function signature at the top of the docstring, also to add in the new arguments?
Enhancements | ||
~~~~~~~~~~~~ | ||
|
||
- Added the xarray equivalent of `pandas.Dataframe.combine_first` as an instance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually let's move this into v0.9.0. I think this change is pretty safe, and it would be nice to get it in promptly.
Thanks for persevering on this, @chunweiyuan ! |
tears in my eyes....... |
Implementing
ops.fillna
usingapply_ufunc
, with ajoin
kwarg. The newops.fillna
is then used to implementcombine_first
in both DataArray and Dataset objects.