Skip to content

Dataset.from_dataframe: deprecate expanding the multi-index #8166

Open
@benbovy

Description

@benbovy

What is your issue?

Let's continue here the discussion about changing the behavior of Dataset.from_dataframe (see #8140 (comment)).

The current behaviour of Dataset.from_dataframe where it always unstacks feels wrong to me.
To me, it seems sensible that Dataset.from_dataframe(df) automatically creates a Dataset with PandasMultiIndex if df has a MultiIndex. The user can then use that or quite easily unstack to a dense or sparse array.

If we don't unstack anymore the multi-index in Dataset.from_dataframe, are we OK that the "Dataset -> DataFrame -> Dataset" round-trip will not yield expected results unless we unstack explicitly?

ds = xr.Dataset(
    {"foo": (("x", "y"), [[1, 2], [3, 4]])},
    coords={"x": ["a", "b"], "y": [1, 2]},
)

df = ds.to_dataframe()
ds2 = xr.Dataset.from_dataframe(df, dim="z")

ds2.identical(ds)  # False

ds2.unstack("z").identical(ds)  # True

cc @max-sixty @dcherian

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions