Skip to content

Combining two named DataArrays results in a Dataset (combine_by_coords) #10774

@etienneschalk

Description

@etienneschalk

What happened?

When combining 2 DataArrays with a name 'name', I got a Dataset with a unique variable 'name' when I expected to get a DataArray. The DataArrays were named because they were extracted from Datasets.

What did you expect to happen?

I expected that combining two DataArrays would result in a Dataset, even if they have a name.

Minimal Complete Verifiable Example

Steps to reproduce

import xarray as xr 
xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.13.1 (main, Feb 10 2025, 10:59:30) [GCC 9.4.0]
python-bits: 64
OS: Linux
OS-release: 5.15.0-139-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None

xarray: 2025.9.0
pandas: 2.3.2
numpy: 2.3.3
scipy: 1.16.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
zarr: 3.1.2
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: 2025.9.0
distributed: None
matplotlib: 3.10.6
cartopy: None
seaborn: 0.13.2
numbagg: None
fsspec: 2025.9.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: None
pip: 25.1.1
conda: None
pytest: 8.4.2
mypy: None
IPython: 9.5.0
sphinx: None

Case A: combining 2 unnamed DataArrays

Expected: 2 DataArrays produce 1 DataArray
Actual: OK

xda1 = xr.DataArray(data=[1], dims=["x"], coords={"x": [1]})
xda2 = xr.DataArray(data=[2], dims=["x"], coords={"x": [2]})
result = xr.combine_by_coords([xda1, xda2])
print(result)
<xarray.DataArray (x: 2)> Size: 16B
array([1, 2])
Coordinates:
  * x        (x) int64 16B 1 2

Case B: combining 2 named DataArrays extracted from Datasets

The 2 extracted DataArrays have a name 'name'.

xda1_from_dataset = xr.Dataset(data_vars={"name": xda1})["name"]
xda2_from_dataset = xr.Dataset(data_vars={"name": xda2})["name"]
print(xda1_from_dataset)
print(xda2_from_dataset)
<xarray.DataArray 'name' (x: 1)> Size: 8B
array([1])
Coordinates:
  * x        (x) int64 8B 1
<xarray.DataArray 'name' (x: 1)> Size: 8B
array([2])
Coordinates:
  * x        (x) int64 8B 2

Expected: 2 named DataArrays produce 1 named DataArray
Actual: the result is a Dataset with a variable 'name'

result = xr.combine_by_coords([xda1_from_dataset, xda2_from_dataset])
print(result)
<xarray.Dataset> Size: 32B
Dimensions:  (x: 2)
Coordinates:
  * x        (x) int64 16B 1 2
Data variables:
    name     (x) int64 16B 1 2

Squeezing does not transform the Dataset into DataArray

result = xr.combine_by_coords([xda1_from_dataset, xda2_from_dataset]).squeeze()
print(result)
<xarray.Dataset> Size: 32B
Dimensions:  (x: 2)
Coordinates:
  * x        (x) int64 16B 1 2
Data variables:
    name     (x) int64 16B 1 2

Only way is to manually select the name. It means the user needs to explicitly select the variable.

result = xr.combine_by_coords([xda1_from_dataset, xda2_from_dataset])["name"]
print(result)
<xarray.DataArray 'name' (x: 2)> Size: 16B
array([1, 2])
Coordinates:
  * x        (x) int64 16B 1 2

Is there something I missed regarding combine_by_coords?

The current behaviour astonished me. Imo combining 2 DataArrays, named or not, should produce a DataArray, not a Dataset, but maybe I'm wrong.

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

Anything else we need to know?

No response

Environment

    INSTALLED VERSIONS
    ------------------
    commit: None
    python: 3.13.1 (main, Feb 10 2025, 10:59:30) [GCC 9.4.0]
    python-bits: 64
    OS: Linux
    OS-release: 5.15.0-139-generic
    machine: x86_64
    processor: x86_64
    byteorder: little
    LC_ALL: None
    LANG: en_US.UTF-8
    LOCALE: ('en_US', 'UTF-8')
    libhdf5: None
    libnetcdf: None
    
    xarray: 2025.9.0
    pandas: 2.3.2
    numpy: 2.3.3
    scipy: 1.16.1
    netCDF4: None
    pydap: None
    h5netcdf: None
    h5py: None
    zarr: 3.1.2
    cftime: None
    nc_time_axis: None
    iris: None
    bottleneck: None
    dask: 2025.9.0
    distributed: None
    matplotlib: 3.10.6
    cartopy: None
    seaborn: 0.13.2
    numbagg: None
    fsspec: 2025.9.0
    cupy: None
    pint: None
    sparse: None
    flox: None
    numpy_groupies: None
    setuptools: None
    pip: 25.1.1
    conda: None
    pytest: 8.4.2
    mypy: None
    IPython: 9.5.0
    sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions