-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
What happened?
When combining 2 DataArrays with a name 'name', I got a Dataset with a unique variable 'name' when I expected to get a DataArray. The DataArrays were named because they were extracted from Datasets.
What did you expect to happen?
I expected that combining two DataArrays would result in a Dataset, even if they have a name.
Minimal Complete Verifiable Example
Steps to reproduce
import xarray as xr xr.show_versions()INSTALLED VERSIONS
------------------
commit: None
python: 3.13.1 (main, Feb 10 2025, 10:59:30) [GCC 9.4.0]
python-bits: 64
OS: Linux
OS-release: 5.15.0-139-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None
xarray: 2025.9.0
pandas: 2.3.2
numpy: 2.3.3
scipy: 1.16.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
zarr: 3.1.2
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: 2025.9.0
distributed: None
matplotlib: 3.10.6
cartopy: None
seaborn: 0.13.2
numbagg: None
fsspec: 2025.9.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: None
pip: 25.1.1
conda: None
pytest: 8.4.2
mypy: None
IPython: 9.5.0
sphinx: None
Case A: combining 2 unnamed DataArrays
Expected: 2 DataArrays produce 1 DataArray
Actual: OK
xda1 = xr.DataArray(data=[1], dims=["x"], coords={"x": [1]})
xda2 = xr.DataArray(data=[2], dims=["x"], coords={"x": [2]})
result = xr.combine_by_coords([xda1, xda2])
print(result)<xarray.DataArray (x: 2)> Size: 16B
array([1, 2])
Coordinates:
* x (x) int64 16B 1 2
Case B: combining 2 named DataArrays extracted from Datasets
The 2 extracted DataArrays have a name 'name'.
xda1_from_dataset = xr.Dataset(data_vars={"name": xda1})["name"]
xda2_from_dataset = xr.Dataset(data_vars={"name": xda2})["name"]
print(xda1_from_dataset)
print(xda2_from_dataset)<xarray.DataArray 'name' (x: 1)> Size: 8B
array([1])
Coordinates:
* x (x) int64 8B 1
<xarray.DataArray 'name' (x: 1)> Size: 8B
array([2])
Coordinates:
* x (x) int64 8B 2
Expected: 2 named DataArrays produce 1 named DataArray
Actual: the result is a Dataset with a variable 'name'
result = xr.combine_by_coords([xda1_from_dataset, xda2_from_dataset])
print(result)<xarray.Dataset> Size: 32B
Dimensions: (x: 2)
Coordinates:
* x (x) int64 16B 1 2
Data variables:
name (x) int64 16B 1 2
Squeezing does not transform the Dataset into DataArray
result = xr.combine_by_coords([xda1_from_dataset, xda2_from_dataset]).squeeze()
print(result)<xarray.Dataset> Size: 32B
Dimensions: (x: 2)
Coordinates:
* x (x) int64 16B 1 2
Data variables:
name (x) int64 16B 1 2
Only way is to manually select the name. It means the user needs to explicitly select the variable.
result = xr.combine_by_coords([xda1_from_dataset, xda2_from_dataset])["name"]
print(result)<xarray.DataArray 'name' (x: 2)> Size: 16B
array([1, 2])
Coordinates:
* x (x) int64 16B 1 2
Is there something I missed regarding combine_by_coords?
The current behaviour astonished me. Imo combining 2 DataArrays, named or not, should produce a DataArray, not a Dataset, but maybe I'm wrong.
MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
- Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.13.1 (main, Feb 10 2025, 10:59:30) [GCC 9.4.0]
python-bits: 64
OS: Linux
OS-release: 5.15.0-139-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None
xarray: 2025.9.0
pandas: 2.3.2
numpy: 2.3.3
scipy: 1.16.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
zarr: 3.1.2
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: 2025.9.0
distributed: None
matplotlib: 3.10.6
cartopy: None
seaborn: 0.13.2
numbagg: None
fsspec: 2025.9.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: None
pip: 25.1.1
conda: None
pytest: 8.4.2
mypy: None
IPython: 9.5.0
sphinx: None