Skip to content

Iterable[Dataset | DataArray] issue with xarray.core.combine.combine_by_coords #10114

Open
@leaver2000

Description

@leaver2000

What is your issue?

I wanted to highlight a typing issue with xarray.core.combine.combine_by_coords where the first argument is data_objects: Iterable[Dataset | DataArray] = []. Specifically the type issue exists when a Generator which is an Iterable is passed as the data_objects argument. See the example below...

from typing import Iterable
import xarray as xr

def dataset_maker(key:str) -> xr.Dataset:
    return xr.Dataset(
        data_vars={
            key: xr.DataArray(data=[1, 2, 3], dims=["x"])
        },
        coords=dict(
            x=xr.DataArray(data=[1, 2, 3], dims=["x"]),
        ),
    )


data_list = [dataset_maker('a'), dataset_maker('b')]
assert isinstance(data_list, Iterable)
print("... { data_list } ...")
print(xr.combine_by_coords(data_list))

data_tuple = dataset_maker('a'), dataset_maker('b')
assert isinstance(data_tuple, Iterable)
print("... { data_tuple } ...")
print(xr.combine_by_coords(data_tuple))

data_iterator = (dataset_maker(k) for k in ['a', 'b']) 
assert isinstance(data_iterator, Iterable)
print("... { data_iterator } ...")
print(xr.combine_by_coords(data_iterator))

The results.

... { data_list } ...
<xarray.Dataset> Size: 72B
Dimensions:  (x: 3)
Coordinates:
  * x        (x) int64 24B 1 2 3
Data variables:
    a        (x) int64 24B 1 2 3
    b        (x) int64 24B 1 2 3
... { data_tuple } ...
<xarray.Dataset> Size: 72B
Dimensions:  (x: 3)
Coordinates:
  * x        (x) int64 24B 1 2 3
Data variables:
    a        (x) int64 24B 1 2 3
    b        (x) int64 24B 1 2 3
... { data_iterator } ...
<xarray.Dataset> Size: 0B
Dimensions:  ()
Data variables:
    *empty*

The issue arises at the head of the function where the generator is exhausted.

    objs_are_unnamed_dataarrays = [
        isinstance(data_object, DataArray) and data_object.name is None
        for data_object in data_objects
    ]

A simple solution would be to change the type annotations to either that of a Sequence or memoize the data_objects variable as...

data_objects = list(data_objects)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions