Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't unstack concatenated DataArrays #7076

Open
4 tasks done
DWesl opened this issue Sep 24, 2022 · 11 comments
Open
4 tasks done

Can't unstack concatenated DataArrays #7076

DWesl opened this issue Sep 24, 2022 · 11 comments
Labels
bug plan to close May be closeable, needs more eyeballs topic-indexing

Comments

@DWesl
Copy link
Contributor

DWesl commented Sep 24, 2022

What happened?

I had a collection of DataArrays with a stacked dimension (dimension whose corresponding index is a MultiIndex). I concatenated them into a single DataArray, then tried to unstack the stacked dimension, which failed. Performing the operations in the other order works (unstacking each DataArray, then concatenating the unstacked arrays).

What did you expect to happen?

I expected that concatenating the arrays then unstacking them would produce the same array as unstacking them then concatenating them, but with the possibility of saving the intermediate concatenated-but-still-stacked DataArray for later use as a template.

Minimal Complete Verifiable Example

import pandas as pd
import xarray
index = pd.MultiIndex.from_product([range(3), range(5)])
arr = xarray.DataArray.from_series(pd.Series(range(15), index=index)).stack(index0=["level_0", "level_1"])
arr.unstack("index0")

arr2 = xarray.concat([arr, arr], dim="index2")
arr2.unstack("index0")

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

<xarray.DataArray (level_0: 3, level_1: 5)>
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])
Coordinates:
  * level_0  (level_0) int64 0 1 2
  * level_1  (level_1) int64 0 1 2 3 4

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "~/.conda/envs/plotting/lib/python3.10/site-packages/xarray/core/dataarray.py", line 2402, in unstack
    ds = self._to_temp_dataset().unstack(dim, fill_value, sparse)
  File "~/.conda/envs/plotting/lib/python3.10/site-packages/xarray/core/dataset.py", line 4618, in unstack
    raise ValueError(
ValueError: cannot unstack dimensions that do not have exactly one multi-index: ('index0',)

Anything else we need to know?

The eventual problem to which I wish to apply the solution has two stacked dimensions rather than one, but that's likely irrelevant.

Environment

INSTALLED VERSIONS

commit: None
python: 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:35:26) [GCC 10.4.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1160.76.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1

xarray: 2022.6.0
pandas: 1.4.2
numpy: 1.22.3
scipy: 1.8.0
netCDF4: 1.6.0
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.5.1.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: 3.2.1.post0
bottleneck: 1.3.5
dask: 2022.7.1
distributed: 2022.7.1
matplotlib: 3.5.1
cartopy: 0.20.3
seaborn: 0.12.0
numbagg: None
fsspec: 2022.5.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 61.3.1
pip: 22.0.4
conda: 4.14.0
pytest: 7.1.3
IPython: None
sphinx: None

@DWesl DWesl added bug needs triage Issue that has not been reviewed by xarray team member labels Sep 24, 2022
@benbovy
Copy link
Member

benbovy commented Sep 26, 2022

Hi @DWesl,

Your example is working with the main branch, it has been fixed in #6889.

# after #6889
arr2.unstack("index0")
# <xarray.DataArray (index2: 2, level_0: 3, level_1: 5)>
# array([[[ 0,  1,  2,  3,  4],
#         [ 5,  6,  7,  8,  9],
#         [10, 11, 12, 13, 14]],
# 
#        [[ 0,  1,  2,  3,  4],
#         [ 5,  6,  7,  8,  9],
#         [10, 11, 12, 13, 14]]])
# Coordinates:
#   * level_0  (level_0) int64 0 1 2
#   * level_1  (level_1) int64 0 1 2 3 4
# Dimensions without coordinates: index2

@benbovy benbovy closed this as completed Sep 26, 2022
@benbovy benbovy added topic-indexing and removed needs triage Issue that has not been reviewed by xarray team member labels Sep 26, 2022
@DWesl
Copy link
Contributor Author

DWesl commented Sep 27, 2022

Fix confirmed, thank you.

@ACHMartin
Copy link

ACHMartin commented Feb 15, 2023

Hi @DWesl, @benbovy

I got a bug very similar to what was initially report with the following minimal example (xarray version 2022.11.0) with the error:

ValueError: cannot unstack dimensions that do not have exactly one multi-index: ('z',)

It is working fine with my previous version (xarray 0.16.0)

import xarray as xr
ds = xr.Dataset(
    data_vars=dict(
        mydata = ( ['across', 'along'], [[0,1],[2,3]])
    )
)
stacked = ds.stack(z=("across", "along"))
newlist = [None] * stacked.z.size
for ii, zindex in enumerate(stacked.z.data):
    newlist[ii] = stacked.mydata.sel(z=zindex)
newds = xr.concat(newlist, dim='z')
newds['z'] = stacked.z
print('xarray version:' + xr.__version__)
newds.unstack(dim='z')

If I do newds = newds.reset_index('z') just before the last line, it is solved for my minimal example. But it still doesn't work for my real problem and I don't think it is the best way to proceed.

Many thanks for your help,
Adrien

@benbovy
Copy link
Member

benbovy commented Feb 15, 2023

@ACHMartin the issue is when you do newds['z'] = stacked.z. In the last versions of Xarray multi-index levels have each their own (real) coordinates, for consistency and clarity we soon won't support assigning a multi-index to a single coordinate of a Dataset / DataArray like that.

I think that in other places we still do support it with a deprecation notice, but apparently in your example this is not the case. unstack doesn't work because the multi-index(es) and the coordinates of newds are not consistent.

I don't know exactly what is your real problem, but from now on you should avoid implicitly assign a multi-index with xr_obj["my_coord"] = ... or xr_obj.assign(my_coord=...). Instead you should re-create the multi-index, e.g., in your minimal example newds = newds.set_index(z=["across", "along"]).

@ACHMartin
Copy link

ACHMartin commented Feb 15, 2023

Many thanks for the explanation and the rapid answer. I am not very familiar with multi-index.

My real problem works with a DataSet with different sort of variables:

ipdb> lmmap
<xarray.Dataset>
Dimensions:        (x_variables: 4, Observables: 2, Antenna: 4, z: 4,
                    Ambiguities: 4, fun_variables: 8, extrema: 2)
Coordinates:
  * x_variables    (x_variables) <U3 'u' 'v' 'c_u' 'c_v'
  * Observables    (Observables) <U6 'sigma0' 'RVL'
  * Antenna        (Antenna) <U4 'Fore' 'MidV' 'MidH' 'Aft'
    fun_variables  (Observables, Antenna) int64 0 1 2 3 4 5 6 7
Dimensions without coordinates: z, Ambiguities, extrema
Data variables: (12/17)
    x              (z, Ambiguities, x_variables) float64 -4.503 7.8 ... -0.4265
    active_mask    (z, Ambiguities, x_variables) int64 0 0 0 0 0 0 ... 0 0 0 0 0
    grad           (z, Ambiguities, x_variables) float64 7.816e-11 ... -1.873...
    fun            (z, Ambiguities, Observables, Antenna) float64 0.001137 .....
    jac            (z, Ambiguities, fun_variables, x_variables) float64 0.359...
    x0             (z, Ambiguities, x_variables) float64 -7.002 10.03 ... 1.219
    ...             ...
    message        (z, Ambiguities) <U42 '`xtol` termination condition is sat...
    success        (z, Ambiguities) bool True True True True ... True True True
    method         (z) <U3 'trf' 'trf' 'trf' 'trf'
    xtol           (z) float64 0.001 0.001 0.001 0.001
    x_scale        (z, x_variables) float64 7.0 7.0 0.5 0.5 ... 7.0 7.0 0.5 0.5
    bounds         (z, extrema, x_variables) int64 -30 -30 -5 -5 ... 30 30 5 5

I don't know where I should apply set_index(z=["across", "along"]) I tried on the full DataSet or a single DataArray but without success. I get the following error:

ValueError: across, along variable(s) do not exist

Which is obvious as across and along variables disappeared. Thank you, I will dig in this direction why it happens.

However, it works perfectly with the minimal example.
Many thanks

@dcherian
Copy link
Contributor

the issue is when you do newds['z'] = stacked.z

Are we raising a warning here?

@ACHMartin
Copy link

I didn't see any warning for this

@ACHMartin
Copy link

Many thanks for the explanation and the rapid answer. I am not very familiar with multi-index.

My real problem works with a DataSet with different sort of variables:

ipdb> lmmap
<xarray.Dataset>
Dimensions:        (x_variables: 4, Observables: 2, Antenna: 4, z: 4,
                    Ambiguities: 4, fun_variables: 8, extrema: 2)
Coordinates:
  * x_variables    (x_variables) <U3 'u' 'v' 'c_u' 'c_v'
  * Observables    (Observables) <U6 'sigma0' 'RVL'
  * Antenna        (Antenna) <U4 'Fore' 'MidV' 'MidH' 'Aft'
    fun_variables  (Observables, Antenna) int64 0 1 2 3 4 5 6 7
Dimensions without coordinates: z, Ambiguities, extrema
Data variables: (12/17)
    x              (z, Ambiguities, x_variables) float64 -4.503 7.8 ... -0.4265
    active_mask    (z, Ambiguities, x_variables) int64 0 0 0 0 0 0 ... 0 0 0 0 0
    grad           (z, Ambiguities, x_variables) float64 7.816e-11 ... -1.873...
    fun            (z, Ambiguities, Observables, Antenna) float64 0.001137 .....
    jac            (z, Ambiguities, fun_variables, x_variables) float64 0.359...
    x0             (z, Ambiguities, x_variables) float64 -7.002 10.03 ... 1.219
    ...             ...
    message        (z, Ambiguities) <U42 '`xtol` termination condition is sat...
    success        (z, Ambiguities) bool True True True True ... True True True
    method         (z) <U3 'trf' 'trf' 'trf' 'trf'
    xtol           (z) float64 0.001 0.001 0.001 0.001
    x_scale        (z, x_variables) float64 7.0 7.0 0.5 0.5 ... 7.0 7.0 0.5 0.5
    bounds         (z, extrema, x_variables) int64 -30 -30 -5 -5 ... 30 30 5 5

I don't know where I should apply set_index(z=["across", "along"]) I tried on the full DataSet or a single DataArray but without success. I get the following error:

ValueError: across, along variable(s) do not exist

Which is obvious as across and along variables disappeared. Thank you, I will dig in this direction why it happens.

However, it works perfectly with the minimal example. Many thanks

Thanks @benbovy, my issue is now resolved, once I have enabled the coords to follow.

ACHMartin added a commit to ACHMartin/seastar_project that referenced this issue Feb 16, 2023
…y one multi-index: ('z',)" occuring in level2/wind_current_retrieval with version xarray 2022.11.0 (was working in xarray v0.16.0) on unstack on 'z'.

Explanation of the pb here:
pydata/xarray#7076 (comment)
Push in cost_function/find_minima the level1 coords to the output
@ACHMartin
Copy link

Hi @benbovy, my code works for 2 dimensions, but it doesn't with 1D. I think the error comes from .set_index(z=['across']) which delivers a "PandasIndex(Int64Index(" and not the expected "PandasIndex(MultiIndex(" for unstack.

I created a Pandas-MultiIndex "MultiIndex([((0,), (1,))], names=['z', 'across'])" (see code just below), but it is not exactly the same one as the one created by .stack() "PandasIndex(MultiIndex([(0,),(1,)], name='z'))" and I didn't find out how to pass the 'index' to xarray.

import pandas as pd
tuples = [(0,),(1,)],
index = pd.MultiIndex.from_tuples(tuples, names=["z","across"])
index

My modified minimal example is as below:

import xarray as xr
ds = xr.Dataset(
    data_vars=dict(
        mydata = ( ['across', 'along'], [[0,1],[2,3]])
    ),
    coords=dict( # <- changed 
        across=[0,1], # <- changed 
        along=[0,1], # <- changed 
    ),    # <- changed
)
stacked = ds.isel(along=0).stack(z=["across"]) # <- changed
newlist = [None] * stacked.z.size
for ii, zindex in enumerate(stacked.z.data):
    newlist[ii] = stacked.mydata.sel(z=zindex)
newds = xr.concat(newlist, dim='z')
newds = newds.set_index(z=['across']) # <- changed
newds.unstack(dim='z') # Fail with ValueError: cannot unstack dimensions that do not have exactly one multi-index: ('z',)

ACHMartin added a commit to ACHMartin/seastar_project that referenced this issue Feb 17, 2023
…y one multi-index: ('z',)" occuring in level2/wind_current_retrieval with version xarray 2022.11.0 (was working in xarray v0.16.0) on unstack on 'z'.

Explanation of the pb here:
pydata/xarray#7076 (comment)
Push in cost_function/find_minima the level1 coords to the output
@dcherian dcherian reopened this Feb 21, 2023
@max-sixty
Copy link
Collaborator

Is the underlying cause of this a dupe of #7148?

@max-sixty max-sixty added the plan to close May be closeable, needs more eyeballs label Oct 19, 2024
@benbovy
Copy link
Member

benbovy commented Oct 19, 2024

I don't think it is a duplicate.

I don't know what @ACHMartin's last minimal example is trying to do, but now the best way to assign a pandas multi-index to a Xarray object is like this:

index = pd.MultiIndex.from_tuples(..., names=["across"])
coords = xr.Coordinates.from_pandas_multiindex(index, dim="z")
ds = xr.Dataset(coords=coords)

It looks like ds.set_index(z=["across"]) demotes to a single index in the case when a single item list is passed. I'm not sure what was the old behavior (a while ago... before we switched to calendar versioning) and whether or not it should be considered as a bug, though.

Regarding assignment of a new coordinate to an existing multi-index coordinate (also reported here above), if we still don't raise an error or a warning we should probably do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug plan to close May be closeable, needs more eyeballs topic-indexing
Projects
None yet
Development

No branches or pull requests

5 participants