Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset.__repr__ causing dask to be computed. #820

Closed
andrewdhicks opened this issue Apr 8, 2016 · 3 comments
Closed

Dataset.__repr__ causing dask to be computed. #820

andrewdhicks opened this issue Apr 8, 2016 · 3 comments

Comments

@andrewdhicks
Copy link

Printing a Dataset object will show array values (#206), but it will also cause a dask-backed array to be computed so that the data variable values can be computed:

<xarray.Dataset>
Dimensions:  (time: 182, x: 420, y: 489)
Coordinates:
  * time     (time) datetime64[ns] 1990-03-02T23:11:16 1990-03-02T23:11:39 ...
  * y        (y) float64 -3.919e+06 -3.919e+06 -3.919e+06 -3.919e+06 ...
  * x        (x) float64 1.538e+06 1.538e+06 1.538e+06 1.538e+06 1.538e+06 ...
Data variables:
    band_20  (time, y, x) float64 729.0 612.0 579.0 629.0 862.0 1.027e+03 ...
    band_10  (time, y, x) float64 476.0 375.0 357.0 416.0 586.0 708.0 730.0 ...
    band_50  (time, y, x) float64 2.175e+03 634.0 245.0 546.0 1.788e+03 ...
...

When a DataArray is printed, it doesn't compute values when using dask:

<xarray.DataArray u'ls5_nbar_albers' (time: 182, y: 489, x: 420)>
dask.array<concate..., shape=(182, 489, 420), dtype=float64, chunksize=(1, 489, 420)>
Coordinates:
  * time      (time) datetime64[ns] 1990-03-02T23:11:16 1990-03-02T23:11:39 ...
  * y         (y) float64 -3.919e+06 -3.919e+06 -3.919e+06 -3.919e+06 ...
  * x         (x) float64 1.538e+06 1.538e+06 1.538e+06 1.538e+06 1.538e+06 ...

There is a check to make sure the data is not remote, but not based on dask status, see: https://github.com/pydata/xarray/blob/master/xarray/core/formatting.py#L173
Is there a way to indicate that computing a particular dask is an expensive operation and it should not be calculated?

@shoyer
Copy link
Member

shoyer commented Apr 8, 2016

Showing a preview of values in a dask dataset can be very convenient for interactive use, so I'm loathe to turn it off entirely. I agree that it's not always useful, though.

The current check for remote data is a complete hack that should probably be removed :).

What do you think about adding an user configurable option to disable printing lazily computed values in datasets? You could then write something like xarray.set_options(preview_lazy_data=False):
http://xarray.pydata.org/en/stable/generated/xarray.set_options.html

@andrewdhicks
Copy link
Author

That sounds like a great solution. I'll get a PR up with the preview_lazy_data option preventing formatting when var._in_memory is false.

@stale
Copy link

stale bot commented Jan 28, 2019

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity
If this issue remains relevant, please comment here; otherwise it will be marked as closed automatically

@stale stale bot added the stale label Jan 28, 2019
@dcherian dcherian removed the stale label Feb 19, 2019
@dcherian dcherian closed this as completed Oct 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants