Skip to content

xindexes set incorrectly for mfdataset with dask client and parallel=True #5686

Closed
@aidanheerdegen

Description

@aidanheerdegen

What happened: Using open_mfdataset with parallel=True with a dask.distributed client active fails to set .xindexes correctly.

What you expected to happen: The indexes should contain an index that can be printed correctly. When using repr the .xindexes fails with TypeError: cannot compute the time difference between dates with different calendars due to an error in .asi8

Minimal Complete Verifiable Example:

import xarray as xr
import numpy as np
from dask.distributed import Client

# Need a main routine for dask.distributed if run as script
if __name__ == "__main__":

    client = Client(n_workers=1) 

    # Create some synthetic data
    time_365_decade = xr.cftime_range(start="2100", periods=120, freq="1MS", calendar="noleap")

    ds = xr.Dataset(
            {"a": ("time", np.arange(time_365_decade.size))},
            coords={"time": time_365_decade},
    )   

    index_microseconds = ds.xindexes['time'].array.asi8

    # Save to a file per year
    years, datasets = zip(*ds.groupby("time.year"))
    xr.save_mfdataset(datasets, [f"{y}.nc" for y in years])

    # Open saved files, parallel=False and asi8 ok
    assert (index_microseconds == xr.open_mfdataset('2???.nc', parallel=False).xindexes['time'].array.asi8).all()

    # Open saved files, parallel=True and asi8 fails
    assert (index_microseconds == xr.open_mfdataset('2???.nc', parallel=True).xindexes['time'].array.asi8).all()

Anything else we need to know?: the asi8 function fails

https://github.com/pydata/xarray/blob/main/xarray/coding/cftimeindex.py#L677

because

epoch = self.date_type(1970, 1, 1)

returns a cftime.datetime with a calendar and has_year_zero attribute that do not match the index

(Pdb) p epoch
cftime.datetime(1970, 1, 1, 0, 0, 0, 0, calendar='gregorian', has_year_zero=False)

Previously reported this as #5677

Environment:

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:39:48)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 4.18.0-305.7.1.el8.nci.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_AU.utf8
LANG: en_AU.ISO8859-1
LOCALE: ('en_AU', 'UTF-8')
libhdf5: 1.10.6
libnetcdf: 4.7.4

xarray: 0.19.0
pandas: 1.3.1
numpy: 1.21.1
scipy: 1.7.1
netCDF4: 1.5.6
pydap: installed
h5netcdf: 0.11.0
h5py: 2.10.0
Nio: None
zarr: 2.8.3
cftime: 1.5.0
nc_time_axis: 1.3.1
PseudoNetCDF: None
rasterio: 1.2.6
cfgrib: 0.9.9.0
iris: 3.0.4
bottleneck: 1.3.2
dask: 2021.07.2
distributed: 2021.07.2
matplotlib: 3.4.2
cartopy: 0.19.0.post1
seaborn: 0.11.1
numbagg: None
pint: 0.17
setuptools: 52.0.0.post20210125
pip: 21.1.3
conda: 4.10.3
pytest: 6.2.4
IPython: 7.26.0
sphinx: 4.1.2

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions