Skip to content

Large coordinate arrays trigger computation #3454

Closed
@jbusecke

Description

@jbusecke

I want to bring up an issue that has tripped up my workflow with large climate models many times. I am dealing with large data arrays of vertical cell thickness. These are 4d arrays (x, y, z, time) but I would define them as coordinates, not data_variables in the xarrays data model (e.g. they should not be multiplied by a value if a dataset is multiplied).
These sort of coordinates might become more prevalent with newer ocean models like MOM6

Whenever I assign these arrays as coordinates operations on the arrays seem to trigger computation, whereas they don't if I set them up as data_variables. The example below shows this behavior.
Is this a bug or done on purpose? Is there a workaround to keep these vertical thicknesses as coordinates?

import xarray as xr
import numpy as np
import dask.array as dsa

# create dataset with with vertical thickness `dz` as data variable
data = xr.DataArray(dsa.random.random([30, 50, 200, 1000]), dims=['x','y', 'z', 't'])
dz = xr.DataArray(dsa.random.random([30, 50, 200, 1000]), dims=['x','y', 'z', 't'])
ds = xr.Dataset({'data':data, 'dz':dz})

#another dataset with `dz` as coordinate
ds_new = xr.Dataset({'data':data})
ds_new.coords['dz'] = dz
%%time
test = ds['data'] * ds['dz']

CPU times: user 1.94 ms, sys: 19.1 ms, total: 21 ms Wall time: 21.6 ms

%%time
test = ds_new['data'] * ds_new['dz']

CPU times: user 17.4 s, sys: 1.98 s, total: 19.4 s Wall time: 12.5 s

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.4 (default, Aug 13 2019, 15:17:50) [Clang 4.0.1 (tags/RELEASE_401/final)] python-bits: 64 OS: Darwin OS-release: 18.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: None libnetcdf: None

xarray: 0.13.0+24.g4254b4af
pandas: 0.25.1
numpy: 1.17.2
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.5.0
distributed: 2.5.1
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
setuptools: 41.2.0
pip: 19.2.3
conda: None
pytest: 5.2.0
IPython: 7.8.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions