Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call .compute() in all plot methods? #5589

Open
johnomotani opened this issue Jul 9, 2021 · 3 comments
Open

Call .compute() in all plot methods? #5589

johnomotani opened this issue Jul 9, 2021 · 3 comments

Comments

@johnomotani
Copy link
Contributor

johnomotani commented Jul 9, 2021

I noticed what I think might be a performance bug: should .compute() be called on the input data in all the plotting methods (e.g. plot.pcolormesh()) like it is in .plot() here

darray = darray.squeeze().compute()
See also discussion in #2390.

I was making plots from a large dataset of a quantity that is the output of quite a bit of computation. A script which made an animation of the full time-series (a couple of thousand time points) actually ran significantly faster than a script that made pcolormesh plots of just 3 time points (~2hrs compared to ~5hrs). The difference I can think of is that the animation script called .values before the animation function, but the plotting script called xarray.plot.pcolormesh() without calling .values/.load()/.compute() first. A modified version of the script that calls .load() before any plot calls reduced the run time to 30 mins even though I plotted 18 time points, not just 3.

2d plots might all be covered by adding a darray = darray.compute() call in newplotfunc()?

def newplotfunc(

I guess the 1d plot functions would all need to be modified individually.

@TomNicholas
Copy link
Member

We should be calling .compute() before passing anything to matplotlib - if we're not then that's a bug. I've also noticed that not doing so can cause huge slowdown.

I think that #5561 might (or really should) fix this as a side effect though.

@TomNicholas
Copy link
Member

But let me know if you spot anywhere that #5561 isn't going to cover with to_numpy() calls!

I guess the 1d plot functions would all need to be modified individually.

@Illviljan was talking about adding a @_plot1d decorator like the 2d one, which might be a good place to enforce this.

@dcherian
Copy link
Contributor

dcherian commented Jul 9, 2021

We should create a DataArray with just the data that is needed based on x, y etc. and then call compute on the whole thing, rather than computing x separately and y separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants