Description
What happened: Using open_mfdataset
with parallel=True
with a dask.distributed
client active fails to set .xindexes
correctly.
What you expected to happen: The indexes
should contain an index that can be printed correctly. When using repr
the .xindexes
fails with TypeError: cannot compute the time difference between dates with different calendars
due to an error in .asi8
Minimal Complete Verifiable Example:
import xarray as xr
import numpy as np
from dask.distributed import Client
# Need a main routine for dask.distributed if run as script
if __name__ == "__main__":
client = Client(n_workers=1)
# Create some synthetic data
time_365_decade = xr.cftime_range(start="2100", periods=120, freq="1MS", calendar="noleap")
ds = xr.Dataset(
{"a": ("time", np.arange(time_365_decade.size))},
coords={"time": time_365_decade},
)
index_microseconds = ds.xindexes['time'].array.asi8
# Save to a file per year
years, datasets = zip(*ds.groupby("time.year"))
xr.save_mfdataset(datasets, [f"{y}.nc" for y in years])
# Open saved files, parallel=False and asi8 ok
assert (index_microseconds == xr.open_mfdataset('2???.nc', parallel=False).xindexes['time'].array.asi8).all()
# Open saved files, parallel=True and asi8 fails
assert (index_microseconds == xr.open_mfdataset('2???.nc', parallel=True).xindexes['time'].array.asi8).all()
Anything else we need to know?: the asi8
function fails
https://github.com/pydata/xarray/blob/main/xarray/coding/cftimeindex.py#L677
because
epoch = self.date_type(1970, 1, 1)
returns a cftime.datetime
with a calendar and has_year_zero
attribute that do not match the index
(Pdb) p epoch
cftime.datetime(1970, 1, 1, 0, 0, 0, 0, calendar='gregorian', has_year_zero=False)
Previously reported this as #5677
Environment:
Output of xr.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:39:48)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 4.18.0-305.7.1.el8.nci.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_AU.utf8
LANG: en_AU.ISO8859-1
LOCALE: ('en_AU', 'UTF-8')
libhdf5: 1.10.6
libnetcdf: 4.7.4
xarray: 0.19.0
pandas: 1.3.1
numpy: 1.21.1
scipy: 1.7.1
netCDF4: 1.5.6
pydap: installed
h5netcdf: 0.11.0
h5py: 2.10.0
Nio: None
zarr: 2.8.3
cftime: 1.5.0
nc_time_axis: 1.3.1
PseudoNetCDF: None
rasterio: 1.2.6
cfgrib: 0.9.9.0
iris: 3.0.4
bottleneck: 1.3.2
dask: 2021.07.2
distributed: 2021.07.2
matplotlib: 3.4.2
cartopy: 0.19.0.post1
seaborn: 0.11.1
numbagg: None
pint: 0.17
setuptools: 52.0.0.post20210125
pip: 21.1.3
conda: 4.10.3
pytest: 6.2.4
IPython: 7.26.0
sphinx: 4.1.2