Open
Description
Is your feature request related to a problem?
dask recently added dask.array.shuffle
to help with some classic GroupBy.map problems.
shuffle
reorders the array so that all members of a single group are in a single chunk, with the possibility of multiple groups in a single chunk. I see a few ways to use this in Xarray:
GroupBy.shuffle()
This shuffles and returns a new GroupBy object with which to do further operations (e.g.map
).Dataset.shuffle_by(Grouper)
This shuffles, and returns a new dataset (or dataarray), so that the shuffled data can be persisted to disk or you can do other things later (xref Saving the groups generated from groupby operation #5674)- Use
GroupBy.shuffle
under the hood inDatasetGroupBy.quantile
andDatasetGroupBy.median
, so that the exact quantile always works regardless of chunking (right now we raise and error), this seems like a no-brainer. - Add either a
shuffle
kwarg toGroupBy.map
and/orGroupBy.reduce
or a new API (e.g.GroupBy.transform
orGroupBy.map_shuffled
) that will shuffle, thenxarray.map_blocks
a wrapper function that applies theGroupby
on each block. This is how dask dataframe implementsGroupby.apply
#9320 implements (1,2). (1) is mostly for convenience, I could easily see us recommending using (2) before calling the GroupBy.
Thoughts?
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response