Skip to content

Update group by multi index #8162

Open
Open
@dcherian

Description

@dcherian

ideally GroupBy._infer_concat_args() would return a xr.Coordinates object that contains both the coordinate(s) and their (multi-)index to assign to the result (combined) object.

The goal is to avoid calling create_default_index_implicit(coord) below where coord is a pd.MultiIndex or a single IndexVariable wrapping a multi-index. If coord is a Coordinates object, we could do combined = combined.assign_coords(coord) instead.

xarray/xarray/core/groupby.py

Lines 1573 to 1587 in e2b6f34

def _combine(self, applied):
"""Recombine the applied objects like the original."""
applied_example, applied = peek_at(applied)
coord, dim, positions = self._infer_concat_args(applied_example)
combined = concat(applied, dim)
(grouper,) = self.groupers
combined = _maybe_reorder(combined, dim, positions, N=grouper.group.size)
# assign coord when the applied function does not return that coord
if coord is not None and dim not in applied_example.dims:
index, index_vars = create_default_index_implicit(coord)
indexes = {k: index for k in index_vars}
combined = combined._overwrite_indexes(indexes, index_vars)
combined = self._maybe_restore_empty_groups(combined)
combined = self._maybe_unstack(combined)
return combined

There are actually more general issues:

  • The group parameter of Dataset.groupby being a single variable or variable name, it won't be possible to do groupby on a full pandas multi-index once we drop its dimension coordinate (Deprecate the multi-index dimension coordinate #8143). How can we still support it? Maybe passing a dimension name to group and check that there's only one index for that dimension?
  • How can we support custom, multi-coordinate indexes with groupby? I don't have any practical example in mind, but in theory just passing a single coordinate name as group will invalidate the index. Should we drop the index in the result? Or, like suggested above pass a dimension name as group and check the index?

Originally posted by @benbovy in #8140 (comment)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    To do

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions