Skip to content

[ENH] More lazy aggregators #3168

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Oct 5, 2018
Merged

Conversation

DPeterK
Copy link
Member

@DPeterK DPeterK commented Sep 12, 2018

Make more aggregators lazy, specifically:

  • iris.analysis.SUM
  • iris.analysis.COUNT

and add testing for these aggregators, and some existing aggregators that were missing lazy functionality testing.

This leaves the following aggregators non-lazy:

@DPeterK DPeterK requested a review from pp-mo September 12, 2018 16:01
@@ -1248,6 +1259,25 @@ def _sum(array, **kwargs):
return rvalue


def _lazy_sum(array, axis, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious: why are we duplicating all of this logic, and not just implementing it once lazily? If we want a non-lazy form, we can just compute the result (as in, lambda _sum: _lazy_sum(*args, **kwargs).compute()), right?

Copy link
Member

@pelson pelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks @dkillick. I've made a few comments, the most substantial relate to the testing strategy (and ultimately the testing strategy of Iris & its sustainability)

@@ -31,6 +33,19 @@
import numpy.ma as ma


def non_lazy(func):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrocklin Do you know if there is already such a decorator defined in dask that we could use? If not, is this a common requirement (i.e. would you like us to submit it to dask)? (e.g. dask.delayed.immediate_dispatch)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No such decorator exists, but depending on what you want to do, Array plugins might be useful:

http://dask.pydata.org/en/latest/array-creation.html#plugins

There is an example there for automatic computation. Again, not exactly what I'm seeing here, but depending on the underlying problem you have this might help.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion @mrocklin! I wonder though if Array plugins aren't quite what we're looking for in this case.

Make a wrapped dask statistic function that supports the 'mdtol' keyword.

'dask_function' must be a dask statistical function, compatible with the
call signature : "dask_stats_function(data, axis, **kwargs)".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

axis=axis now that you've changed the Aggregator.

try:
func = kwargs.pop('function')
except KeyError:
raise KeyError('no selection function supplied.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm +0 on this extra exception. Without it (assuming you do func = kwargs.pop('function', None) we get a TypeError, which is good enough for me (I accept that KeyError give you even more exception fidelity, but my argument is that the fidelity isn't worth the extra lines).


def test_no_function(self):
with self.assertRaisesRegexp(KeyError, 'no selection function'):
self.lazy_cube.collapsed("foo", COUNT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not that happy that the unit test is calling cube.collapsed. Could you just test that the aggregator produces the right numbers when you pass a numpy array through, and thus rely on the testing of the Aggregator class to ensure that the integration is good?

@DPeterK DPeterK added this to the v2.2.0 milestone Oct 2, 2018
@corinnebosley corinnebosley merged commit 39a7d54 into SciTools:master Oct 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants