Skip to content

Conversation

@dschwoerer
Copy link
Contributor

Avoids changing the datatype if the data does not have the requested
axis.

Avoids changing the datatype if the data does not have the requested
axis.
Copy link
Collaborator

@max-sixty max-sixty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @dschwoerer !

I think the result is right.

It's implemented as a special-case, which isn't ideal. But I'm not sure how to abstract the rule that applies to mean but not to std — something like "does f([x]) == x". I guess it should apply to .sum too.

For other reviewers — here's an example of the std case (which this PR doesn't change, correctly):

In [5]: ds
Out[5]:
<xarray.Dataset>
Dimensions:  (pos: 3, time: 2)
Coordinates:
  * pos      (pos) int64 1 2 3
Dimensions without coordinates: time
Data variables:
    data     (pos, time) float64 1.0 2.0 3.0 4.0 5.0 6.0
    var      (pos) int64 2 3 4

In [6]: ds.std('time')
Out[6]:
<xarray.Dataset>
Dimensions:  (pos: 3)
Coordinates:
  * pos      (pos) int64 1 2 3
Data variables:
    data     (pos) float64 0.5 0.5 0.5
    var      (pos) float64 0.0 0.0 0.0    # this should be 0.0, unlike `.mean`'s result

as positional, all others need to be passed are keyword arguments. This is part of the
refactor to support external backends (:issue:`4309`, :pull:`4989`).
By `Alessandro Amici <https://github.com/alexamici>`_.
- :py:func:`mean` does not change the data if axis is None. This
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:py:func:mean is actually a method (there should be other references in the whats-new which can be copied)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't found an example, so I have removed the explicit reference

ds["pos"] = [1, 2, 3]
ds["data"] = ("pos", "time"), [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]
ds["var"] = "pos", [2, 3, 4]
ds2 = ds.mean(dim="time")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convention:

Suggested change
ds2 = ds.mean(dim="time")
result = ds.mean(dim="time")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not store the result but use it directly. Is that ok as well?

ds = Dataset()
ds["pos"] = [1, 2, 3]
ds["data"] = ("pos", "time"), [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]
ds["var"] = "pos", [2, 3, 4]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI you can construct the dataset in a single expression, but it's also fine as-is (Dataset(dict(pos=..., data=...)))

* Better testing
* Clarify comment
* Handle other functions as well, like sum, min, max
@dschwoerer
Copy link
Contributor Author

I have now changed so that several wrapped functions preserve the data.

It is more generic, and hopefully still readable. The flag might also be called identity_0d because the function is the identity function for 0d data, while invariant_0d means that any 0d data is invariant under this function.

@max-sixty
Copy link
Collaborator

Excellent! Thanks @dschwoerer .

Any thoughts from others before we merge, particularly on the identity_0d approach?

@max-sixty
Copy link
Collaborator

Merging — as ever, any additional feedback we can address in another PR.

Thank you very much @dschwoerer , we're happy to have you as a contributor!

@max-sixty max-sixty merged commit 24c357f into pydata:master Apr 24, 2021
@keewis keewis mentioned this pull request Jun 10, 2021
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dataset.mean changes variables without specified dimension

2 participants