Description
What is your issue?
xr.concat
doesn't concatenate dimension coordinates along new dimensions, which leads to pretty unintuitive behavior.
Take this example (motivated by #7532 (reply in thread))
segments = []
for i in range(2):
time = np.sort(np.random.random(4))
da = xr.DataArray(
np.random.randn(4,2),
dims=["time", "cols"],
coords=dict(time=('time', time), cols=["col1", "col2"]),
)
segments.append(da)
In [86]: segments
Out[86]:
[<xarray.DataArray (time: 4, cols: 2)>
array([[-0.61199576, -0.9012078 ],
[-0.54187577, 1.30509994],
[-3.53720471, 0.97607797],
[ 0.2593455 , 0.95920031]])
Coordinates:
* time (time) float64 0.1048 0.168 0.869 0.9432
* cols (cols) <U4 'col1' 'col2',
<xarray.DataArray (time: 4, cols: 2)>
array([[ 0.90266408, -0.54294821],
[-1.09087103, -0.17484417],
[-0.21679558, -0.57377412],
[ 0.07570151, 0.27433728]])
Coordinates:
* time (time) float64 0.03627 0.09754 0.2434 0.592
* cols (cols) <U4 'col1' 'col2']
In [85]: xr.concat(segments, dim='new')
Out[85]:
<xarray.DataArray (new: 2, time: 8, cols: 2)>
array([[[ nan, nan],
[ nan, nan],
[-0.61199576, -0.9012078 ],
[-0.54187577, 1.30509994],
[ nan, nan],
[ nan, nan],
[-3.53720471, 0.97607797],
[ 0.2593455 , 0.95920031]],
[[ 0.90266408, -0.54294821],
[-1.09087103, -0.17484417],
[ nan, nan],
[ nan, nan],
[-0.21679558, -0.57377412],
[ 0.07570151, 0.27433728],
[ nan, nan],
[ nan, nan]]])
Coordinates:
* time (time) float64 0.03627 0.09754 0.1048 0.168 ... 0.592 0.869 0.9432
* cols (cols) <U4 'col1' 'col2'
Dimensions without coordinates: new
I would have expected to get a result of size {new: 2, time: 4, cols: 2}
. That would be intuitive, because the default is coords='different'
, and that would be the result of concatenating each time
coordinate (which have different values) and just propagating the cols
coordinate (as they have the same values).
Instead what happened is that xr.concat
treats the dimension coordinates as indexes to align, and defaults to an outer join. This auto-alignment behaviour has been discussed at length before, I'm just trying to point out another place in which its problematic.
This is kind of briefly mentioned in the concat docstring under coords='all'
:
“all”: All coordinate variables will be concatenated, except those corresponding to other dimensions.
but it's not even mentioned under coords='different'
I don't really know what I would prefer to happen with the coordinates. I guess to have created a time
coordinate of size {new: 2, time: 4, cols: 2}
, but then I don't know what that implies for the underlying index. @benbovy do you have any thoughts?
At the very least we should make this a lot clearer in the docs.