Skip to content

Dataset.__repr__ causing dask to be computed. #820

Closed
@andrewdhicks

Description

@andrewdhicks

Printing a Dataset object will show array values (#206), but it will also cause a dask-backed array to be computed so that the data variable values can be computed:

<xarray.Dataset>
Dimensions:  (time: 182, x: 420, y: 489)
Coordinates:
  * time     (time) datetime64[ns] 1990-03-02T23:11:16 1990-03-02T23:11:39 ...
  * y        (y) float64 -3.919e+06 -3.919e+06 -3.919e+06 -3.919e+06 ...
  * x        (x) float64 1.538e+06 1.538e+06 1.538e+06 1.538e+06 1.538e+06 ...
Data variables:
    band_20  (time, y, x) float64 729.0 612.0 579.0 629.0 862.0 1.027e+03 ...
    band_10  (time, y, x) float64 476.0 375.0 357.0 416.0 586.0 708.0 730.0 ...
    band_50  (time, y, x) float64 2.175e+03 634.0 245.0 546.0 1.788e+03 ...
...

When a DataArray is printed, it doesn't compute values when using dask:

<xarray.DataArray u'ls5_nbar_albers' (time: 182, y: 489, x: 420)>
dask.array<concate..., shape=(182, 489, 420), dtype=float64, chunksize=(1, 489, 420)>
Coordinates:
  * time      (time) datetime64[ns] 1990-03-02T23:11:16 1990-03-02T23:11:39 ...
  * y         (y) float64 -3.919e+06 -3.919e+06 -3.919e+06 -3.919e+06 ...
  * x         (x) float64 1.538e+06 1.538e+06 1.538e+06 1.538e+06 1.538e+06 ...

There is a check to make sure the data is not remote, but not based on dask status, see: https://github.com/pydata/xarray/blob/master/xarray/core/formatting.py#L173
Is there a way to indicate that computing a particular dask is an expensive operation and it should not be calculated?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions