Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataArrayCoarsen.reduce passes kwargs to coordinate function #8059

Open
saskartt opened this issue Aug 9, 2023 · 6 comments
Open

DataArrayCoarsen.reduce passes kwargs to coordinate function #8059

saskartt opened this issue Aug 9, 2023 · 6 comments

Comments

@saskartt
Copy link

saskartt commented Aug 9, 2023

What is your issue?

When using custom reducing function with the .reduce(func, keep_attrs=None, **kwargs) method of xarray.core.rolling.DataArrayCoarsen object, keyword arguments of the reduce method are passed to the given reducing function (func) as documented. However, if a different function is given to reduce the coordinates when the coarsen or rolling object is created, e.g. coord_func="mean", the same kwargs get also passed to the coordinate reducing function. This is quite often undesired, as the function to reduce the coordinates might not have the same kwargs as the function used to reduce the actual data.

Here is an example:

import numpy as np
import xarray as xr

def reduce_func_with_kwargs(x, axis, dummy=None):
    dummy = 0
    return np.sum(x, axis=axis)

np.random.seed(0)
temperature = 15 + 8 * np.random.randn(2, 2)
lon = [[-99.83, -99.32], [-99.79, -99.23]]
lat = [[42.25, 42.21], [42.63, 42.59]]

da = xr.DataArray(
    data=temperature,
    dims=["x", "y"],
    coords=dict(
        lon=(["x", "y"], lon),
        lat=(["x", "y"], lat),
    ),
    attrs=dict(
        description="Ambient temperature.",
        units="degC",
    ),
)

# This works fine
coarsen_mean = da.coarsen({"x": 2, "y": 2}, coord_func="mean").reduce(
    reduce_func_with_kwargs
) 
# This raises TypeError: nanmean() got an unexpected keyword argument 'dummy'
coarsen_reduce = da.coarsen({"x": 2, "y": 2}, coord_func="mean").reduce(
    reduce_func_with_kwargs, dummy=1
)

The first reduction works as expected, as no kwargs are passed to the reduce method. However, the latter reduction raises a TypeError

Traceback (most recent call last):
  File "/home/saskartt/Code/misc/xarray_bug_example.py", line 28, in <module>
    coarsen_reduce = da.coarsen({"x": 2, "y": 2}, coord_func="mean").reduce(reduce_func_with_kwargs, dummy=1) 
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/saskartt/miniconda3/envs/xarray-bug/lib/python3.11/site-packages/xarray/core/rolling.py", line 1064, in reduce
    return wrapped_func(self, keep_attrs=keep_attrs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/saskartt/miniconda3/envs/xarray-bug/lib/python3.11/site-packages/xarray/core/rolling.py", line 1012, in wrapped_func
    coords[c] = v.variable.coarsen(
                ^^^^^^^^^^^^^^^^^^^
  File "/home/saskartt/miniconda3/envs/xarray-bug/lib/python3.11/site-packages/xarray/core/variable.py", line 2522, in coarsen
    return self._replace(data=func(reshaped, axis=axes, **kwargs), attrs=_attrs)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/saskartt/miniconda3/envs/xarray-bug/lib/python3.11/site-packages/xarray/core/duck_array_ops.py", line 615, in mean
    return _mean(array, axis=axis, skipna=skipna, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/saskartt/miniconda3/envs/xarray-bug/lib/python3.11/site-packages/xarray/core/duck_array_ops.py", line 382, in f
    return func(values, axis=axis, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: nanmean() got an unexpected keyword argument 'dummy'

This is because kwarg dummy=1 gets passed to coordinate reduction function "mean" (nanmean) in addition to reduce_func_with_kwargs in xarray/core/rolling.py:1012.

Currently, it is undocumented that the kwargs are also passed to the coordinate reduction function, so this is probably technically not a bug. However, I think in many cases you might want to use different kwargs for the functions and some solution for this case would be nice.

INSTALLED VERSIONS
------------------
commit: None
python: 3.11.4 | packaged by conda-forge | (main, Jun 10 2023, 18:08:17) [GCC 12.2.0]

xarray: 2023.7.0
pandas: 2.0.3
numpy: 1.25.2

Cheers,
Sasu Karttunen

@saskartt saskartt added the needs triage Issue that has not been reviewed by xarray team member label Aug 9, 2023
@welcome
Copy link

welcome bot commented Aug 9, 2023

Thanks for opening your first issue here at xarray! Be sure to follow the issue template!
If you have an idea for a solution, we would really welcome a Pull Request with proposed changes.
See the Contributing Guide for more.
It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better.
Thank you!

@TomNicholas
Copy link
Member

Thanks for reporting this. Any ideas for the best way to resolve it with a PR? An extra coord_func_kwargs dict passed?

Related to #7981 in that subtleties with the whole coord_func functionality are pretty poorly documented.

@TomNicholas TomNicholas added API design topic-documentation and removed needs triage Issue that has not been reviewed by xarray team member labels Aug 9, 2023
@dcherian
Copy link
Contributor

dcherian commented Aug 9, 2023

Any ideas for the best way to resolve it with a PR? An extra coord_func_kwargs dict passed?

I think we should encourage the use of partial and not add a new kwarg. We should definitely fix the bug here though

@keewis
Copy link
Collaborator

keewis commented Aug 9, 2023

I was about to post something similar, so 👍 from me (though there are a few more options other than functools.partial, such as using a wrapper / lambda, or toolz.functoolz.curry)

@saskartt
Copy link
Author

I don't really have much knowledge on what practices are preferred in xarray API and implementation, but I would have added the kwargs for the coord_func to the .coarsen method itself as a kwarg dict, as the coord_func itself is a parameter already there. For me, it would seem more clear that the both function and its kwargs are given in the same method.

@keewis
Copy link
Collaborator

keewis commented Aug 10, 2023

exactly. We're proposing something like this:

from functools import partial

def coords_reduce(x, axis, arg1):
    ...

da.coarsen({"x": 2, "y": 2}, coord_func=partial(coords_reduce, arg1="additional data").reduce(
    reduce_func_with_kwargs, dummy=1
)

and then obviously dummy should not be passed to coord_func (it doing that at the moment is likely a bug).

That aside, though, it does feel a bit odd that we're passing coord_func in coarsen and reduce_func in .reduce, I would have expected to pass both in .reduce (I might be missing something, though)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants