Skip to content

Wrap numpy-groupies to speed up Xarray's groupby aggregations #4473

Closed
@shoyer

Description

@shoyer

Is your feature request related to a problem? Please describe.

Xarray's groupby aggregations (e.g., groupby(..).sum()) are very slow compared to pandas, as described in #659.

Describe the solution you'd like

We could speed things up considerably (easily 100x) by wrapping the numpy-groupies package.

Additional context

One challenge is how to handle dask arrays (and other duck arrays). In some cases it might make sense to apply the numpy-groupies function (using apply_ufunc), but in other cases it might be better to stick with the current indexing + concatenate solution. We could either pick some simple heuristics for choosing the algorithm to use on dask arrays, or could just stick with the current algorithm for now.

In particular, it might make sense to stick with the current algorithm if there are a many chunks in the arrays to aggregated along the "grouped" dimension (depending on the size of the unique group values).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions