Skip to content

Multi-index with categorical values #3674

Closed
@mancellin

Description

@mancellin

Building a dataset from pandas with a multi-index with categorical values:

import pandas as pd

cat = pd.CategoricalDtype(categories=['foo', 'bar', 'baz'])
i1 = pd.Series(['foo', 'bar'], dtype=cat)
i2 = pd.Series(['bar', 'bar'], dtype=cat)

df = pd.DataFrame({'i1': i1, 'i2': i2, 'values': [1, 2]})
ds = df.set_index(['i1', 'i2']).to_xarray()

print(ds)

Expected output:

<xarray.Dataset>
Dimensions:  (i1: 2, i2: 1)
Coordinates:
  * i1       (i1) object 'foo' 'bar'
  * i2       (i2) object 'bar'
Data variables:
    values   (i1, i2) int64 1 2

Actual output:

<xarray.Dataset>
Dimensions:  (i1: 3, i2: 3)
Coordinates:
  * i1       (i1) object 'foo' 'bar' 'baz'
  * i2       (i2) object 'foo' 'bar' 'baz'
Data variables:
    values   (i1, i2) float64 nan 1.0 nan nan 2.0 nan nan nan nan

It is not wrong, but it is inconsistent with the non-categorical case (which gives the expected output above) and the single-index case (no filling with NaNs for single index).

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.8.0 (default, Nov 6 2019, 21:49:08) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.19.91-1-MANJARO machine: x86_64 processor: byteorder: little LC_ALL: None LANG: fr_FR.UTF-8 LOCALE: fr_FR.UTF-8 libhdf5: None libnetcdf: None

xarray: 0.14.1
pandas: 0.25.3
numpy: 1.17.4
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
setuptools: 44.0.0.post20200106
pip: 19.3.1
conda: None
pytest: None
IPython: None
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions