-
Notifications
You must be signed in to change notification settings - Fork 295
[ENH] More lazy aggregators #3168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
lib/iris/analysis/__init__.py
Outdated
@@ -1248,6 +1259,25 @@ def _sum(array, **kwargs): | |||
return rvalue | |||
|
|||
|
|||
def _lazy_sum(array, axis, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious: why are we duplicating all of this logic, and not just implementing it once lazily? If we want a non-lazy form, we can just compute the result (as in, lambda _sum: _lazy_sum(*args, **kwargs).compute()
), right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thanks @dkillick. I've made a few comments, the most substantial relate to the testing strategy (and ultimately the testing strategy of Iris & its sustainability)
@@ -31,6 +33,19 @@ | |||
import numpy.ma as ma | |||
|
|||
|
|||
def non_lazy(func): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mrocklin Do you know if there is already such a decorator defined in dask that we could use? If not, is this a common requirement (i.e. would you like us to submit it to dask)? (e.g. dask.delayed.immediate_dispatch
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No such decorator exists, but depending on what you want to do, Array plugins might be useful:
http://dask.pydata.org/en/latest/array-creation.html#plugins
There is an example there for automatic computation. Again, not exactly what I'm seeing here, but depending on the underlying problem you have this might help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion @mrocklin! I wonder though if Array plugins aren't quite what we're looking for in this case.
lib/iris/analysis/__init__.py
Outdated
Make a wrapped dask statistic function that supports the 'mdtol' keyword. | ||
|
||
'dask_function' must be a dask statistical function, compatible with the | ||
call signature : "dask_stats_function(data, axis, **kwargs)". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
axis=axis
now that you've changed the Aggregator.
lib/iris/analysis/__init__.py
Outdated
try: | ||
func = kwargs.pop('function') | ||
except KeyError: | ||
raise KeyError('no selection function supplied.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm +0 on this extra exception. Without it (assuming you do func = kwargs.pop('function', None)
we get a TypeError, which is good enough for me (I accept that KeyError give you even more exception fidelity, but my argument is that the fidelity isn't worth the extra lines).
|
||
def test_no_function(self): | ||
with self.assertRaisesRegexp(KeyError, 'no selection function'): | ||
self.lazy_cube.collapsed("foo", COUNT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not that happy that the unit test is calling cube.collapsed
. Could you just test that the aggregator produces the right numbers when you pass a numpy array through, and thus rely on the testing of the Aggregator class to ensure that the integration is good?
Make more aggregators lazy, specifically:
and add testing for these aggregators, and some existing aggregators that were missing lazy functionality testing.
This leaves the following aggregators non-lazy: