Skip to content

BUG: fix + test open_mfdataset fails on variable attributes with list… #3181

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Aug 4, 2019
10 changes: 6 additions & 4 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -61,8 +61,12 @@ Bug fixes
By `Tom Nicholas <http://github.com/TomNicholas>`_.
- Fixed crash when applying ``distributed.Client.compute()`` to a DataArray
(:issue:`3171`). By `Guido Imperiale <https://github.com/crusaderky>`_.


- Better error message when using groupby on an empty DataArray (:issue:`3037`).
By `Hasan Ahmad <https://github.com/HasanAhmadQ7>`_.
- Fix error that arises when using open_mfdataset on a series of netcdf files
having differing values for a variable attribute of type list. (:issue:`3034`)
By `Hasan Ahmad <https://github.com/HasanAhmadQ7>`_.

.. _whats-new.0.12.3:

v0.12.3 (10 July 2019)
Expand Down Expand Up @@ -103,8 +107,6 @@ Bug fixes
- Fix HDF5 error that could arise when reading multiple groups from a file at
once (:issue:`2954`).
By `Stephan Hoyer <https://github.com/shoyer>`_.
- Better error message when using groupby on an empty DataArray (:issue:`3037`).
By `Hasan Ahmad <https://github.com/HasanAhmadQ7>`_.

.. _whats-new.0.12.2:

Expand Down
15 changes: 14 additions & 1 deletion xarray/core/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -129,18 +129,31 @@ def maybe_wrap_array(original, new_array):

def equivalent(first: T, second: T) -> bool:
"""Compare two objects for equivalence (identity or equality), using
array_equiv if either object is an ndarray
array_equiv if either object is an ndarray. If both objects are lists,
equivalent is sequentially called on all the elements.
"""
# TODO: refactor to avoid circular import
from . import duck_array_ops
if isinstance(first, np.ndarray) or isinstance(second, np.ndarray):
return duck_array_ops.array_equiv(first, second)
elif isinstance(first, list) or isinstance(second, list):
return list_equiv(first, second)
else:
return ((first is second) or
(first == second) or
(pd.isnull(first) and pd.isnull(second)))


def list_equiv(first, second):
equiv = True
if len(first) != len(second):
return False
else:
for f, s in zip(first, second):
equiv = equiv and equivalent(f, s)
return equiv


def peek_at(iterable: Iterable[T]) -> Tuple[T, Iterator[T]]:
"""Returns the first value from iterable, as well as a new iterator with
the same content as the original iterable
Expand Down
24 changes: 24 additions & 0 deletions xarray/tests/test_backends.py
Original file line number Diff line number Diff line change
Expand Up @@ -2361,6 +2361,30 @@ def test_open_mfdataset_manyfiles(readengine, nfiles, parallel, chunks,
assert_identical(original, actual)


@requires_netCDF4
def test_open_mfdataset_list_attr():
"""
Case when an attribute of type list differs across the multiple files
"""
from netCDF4 import Dataset
with create_tmp_files(2) as nfiles:
for i in range(2):
f = Dataset(nfiles[i], "w")
f.createDimension("x", 3)
vlvar = f.createVariable("test_var", np.int32, ("x"))
# here create an attribute as a list
vlvar.test_attr = ["string a {}".format(i),
"string b {}".format(i)]
vlvar[:] = np.arange(3)
f.close()
ds1 = open_dataset(nfiles[0])
ds2 = open_dataset(nfiles[1])
original = xr.concat([ds1, ds2], dim='x')
with xr.open_mfdataset([nfiles[0], nfiles[1]], combine='nested',
concat_dim='x') as actual:
assert_identical(actual, original)


@requires_scipy_or_netCDF4
@requires_dask
class TestOpenMFDatasetWithDataVarsAndCoordsKw:
Expand Down