You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
>>># da["location"] looks like a normal DataArray>>>location=da["location"]
>>>location<xarray.DataArray'location' (location: 2)>array([(10, 20), (11, 21)], dtype=object)
Coordinates:
*location (location) MultiIndex-lat (location) int641011-lon (location) int642021>>># but in actual fact, the variable._data is a MultiIndex>>>location.variable._dataPandasIndexAdapter(array=MultiIndex([(10, 20),
(11, 21)],
names=['lat', 'lon']), dtype=dtype('O'))
This is why an error is thrown: variable.assert_unique_multiindex_level_names gets passed two variables: location.variable (the DataArray data values), and also location["location"].variable (the coordinate values), which are both MultiIndexes.
Multiindex level name conflicts should only be checked for coordinates, not data variables.
But I've only spent a few hours digging through the codebase to try and understand this problem - I'm not quite sure what the implications would be.
Here is another place where it feels like it makes more sense to only check the MultiIndex level names of coords:
>>>da=xr.DataArray([0, 1], dims=["location"], coords={"lat": ("location", [10, 11]), "lon": ("location", [20, 21])}).set_index(location=["lat", "lon"])
>>>location=da["location"]
# you cannot directly make a dataset with `location` as a data variable>>>xr.Dataset({"data": location})
Traceback (mostrecentcalllast):
File"<stdin>", line1, in<module>File"/home/harry/code/xarray/xarray/core/dataset.py", line541, in__init__variables, coord_names, dims, indexes=merge_data_and_coords(
File"/home/harry/code/xarray/xarray/core/merge.py", line466, inmerge_data_and_coordsreturnmerge_core(
File"/home/harry/code/xarray/xarray/core/merge.py", line556, inmerge_coreassert_unique_multiindex_level_names(variables)
File"/home/harry/code/xarray/xarray/core/variable.py", line2363, inassert_unique_multiindex_level_namesraiseValueError("conflicting MultiIndex level name(s):\n%s"%conflict_str)
ValueError: conflictingMultiIndexlevelname(s):
'lat' (location), 'lat' (data)
'lon' (location), 'lon' (data)
# but if you go a round-about way, you can exploit that assign_coords only checks # the multiindex names of coordinates, not data variables
```python>>>ds=xr.Dataset({"data": xr.DataArray(data=location.variable._data, dims=["location"])})
>>>ds=ds.assign_coords({"location": location})
>>>ds<xarray.Dataset>Dimensions: (location: 2)
Coordinates:
*location (location) MultiIndex-lat (location) int641011-lon (location) int642021Datavariables:
data (location) object (10, 20) (11, 21)
>>>ds["data"].variable._dataPandasIndexAdapter(array=MultiIndex([(10, 20),
(11, 21)],
names=['lat', 'lon']), dtype=dtype('O'))
If making variable.assert_unique_multiindex_level_names only check coords is the way to go, I'm keen + happy to try putting together a pull request for this.
MCVE Code Sample
Expected Output
The output should be the same as first concatenating the DataArrays, then extracting the dimension location:
Problem Description
This is why an error is thrown:
variable.assert_unique_multiindex_level_names
gets passed two variables:location.variable
(the DataArray data values), and alsolocation["location"].variable
(the coordinate values), which are both MultiIndexes.Output of
xr.show_versions()
xarray: 0.14.1+36.gb3d3b44
pandas: 0.25.3
numpy: 1.18.0
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.9.1
distributed: 2.9.1
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
setuptools: 42.0.2.post20191201
pip: 19.3.1
conda: None
pytest: 5.3.2
IPython: None
sphinx: None
The text was updated successfully, but these errors were encountered: