|
| 1 | +--- |
| 2 | +title: 'Grouping by multiple arrays with Xarray' |
| 3 | +date: '2023-07-18' |
| 4 | +authors: |
| 5 | + - name: Deepak Cherian |
| 6 | + github: dcherian |
| 7 | + |
| 8 | +summary: 'Xarray finally supports grouping by multiple arrays. 🎉' |
| 9 | +--- |
| 10 | + |
| 11 | +## TLDR |
| 12 | + |
| 13 | +Xarray now supports grouping by multiple variables ([docs](https://docs.xarray.dev/en/latest/user-guide/groupby.html#grouping-by-multiple-variables)). 🎉 😱 🤯 🥳. Try it out! |
| 14 | + |
| 15 | +## How do I use it? |
| 16 | + |
| 17 | +Install `xarray>=2024.08.0` and optionally [flox](https://flox.readthedocs.io/en/latest/) for better performance with reductions. |
| 18 | + |
| 19 | +## Simple example |
| 20 | + |
| 21 | +Set up a multiple variable groupby using [Grouper objects](https://docs.xarray.dev/en/latest/user-guide/groupby.html#grouping-by-multiple-variables). |
| 22 | + |
| 23 | +```python |
| 24 | +import xarray as xr |
| 25 | +from xarray.groupers import UniqueGrouper |
| 26 | + |
| 27 | +da = xr.DataArray( |
| 28 | + np.array([1, 2, 3, 0, 2, np.nan]), |
| 29 | + dims="d", |
| 30 | + coords=dict( |
| 31 | + labels1=("d", np.array(["a", "b", "c", "c", "b", "a"])), |
| 32 | + labels2=("d", np.array(["x", "y", "z", "z", "y", "x"])), |
| 33 | + ), |
| 34 | +) |
| 35 | + |
| 36 | +gb = da.groupby(labels1=UniqueGrouper(), labels2=UniqueGrouper()) |
| 37 | +gb |
| 38 | +``` |
| 39 | + |
| 40 | +``` |
| 41 | +<DataArrayGroupBy, grouped over 2 grouper(s), 9 groups in total: |
| 42 | + 'labels1': 3 groups with labels 'a', 'b', 'c' |
| 43 | + 'labels2': 3 groups with labels 'x', 'y', 'z'> |
| 44 | +``` |
| 45 | + |
| 46 | +Reductions work as usual: |
| 47 | + |
| 48 | +```python |
| 49 | +gb.mean() |
| 50 | +``` |
| 51 | + |
| 52 | +``` |
| 53 | +xarray.DataArray (labels1: 3, labels2: 3)> Size: 72B |
| 54 | +array([[1. , nan, nan], |
| 55 | + [nan, 2. , nan], |
| 56 | + [nan, nan, 1.5]]) |
| 57 | +Coordinates: |
| 58 | + * labels1 (labels1) object 24B 'a' 'b' 'c' |
| 59 | + * labels2 (labels2) object 24B 'x' 'y' 'z' |
| 60 | +``` |
| 61 | + |
| 62 | +So does `map`: |
| 63 | + |
| 64 | +```python |
| 65 | +gb.map(lambda x: x[0]) |
| 66 | +``` |
| 67 | + |
| 68 | +``` |
| 69 | +<xarray.DataArray (labels1: 3, labels2: 3)> Size: 72B |
| 70 | +array([[ 1., nan, nan], |
| 71 | + [nan, 2., nan], |
| 72 | + [nan, nan, 3.]]) |
| 73 | +Coordinates: |
| 74 | + * labels1 (labels1) object 24B 'a' 'b' 'c' |
| 75 | + * labels2 (labels2) object 24B 'x' 'y' 'z' |
| 76 | +``` |
| 77 | + |
| 78 | +## Multiple Groupers |
| 79 | + |
| 80 | +Combining different grouper types is allowed, that is you can combine |
| 81 | +categorical grouping with` UniqueGrouper`, binning with `BinGrouper`, and |
| 82 | +resampling with `TimeResampler`. |
| 83 | + |
| 84 | +```python |
| 85 | +ds = xr.Dataset( |
| 86 | + {"foo": (("x", "y"), np.arange(12).reshape((4, 3)))}, |
| 87 | + coords={"x": [10, 20, 30, 40], "letters": ("x", list("abba"))}, |
| 88 | + ) |
| 89 | +gb = ds.groupby(x=BinGrouper(bins=[5, 15, 25]), letters=UniqueGrouper()) |
| 90 | +gb |
| 91 | +``` |
| 92 | + |
| 93 | +``` |
| 94 | +from xarray.groupers import BinGrouper |
| 95 | +
|
| 96 | +ds = xr.Dataset( |
| 97 | + {"foo": (("x", "y"), np.arange(12).reshape((4, 3)))}, |
| 98 | + coords={"x": [10, 20, 30, 40], "letters": ("x", list("abba"))}, |
| 99 | + ) |
| 100 | +gb = ds.foo.groupby(x=BinGrouper(bins=[5, 15, 25]), letters=UniqueGrouper()) |
| 101 | +gb |
| 102 | +``` |
| 103 | + |
| 104 | +``` |
| 105 | +<DatasetGroupBy, grouped over 2 grouper(s), 4 groups in total: |
| 106 | + 'x_bins': 2 groups with labels (5,, 15], (15,, 25] |
| 107 | + 'letters': 2 groups with labels 'a', 'b'> |
| 108 | +``` |
| 109 | + |
| 110 | +```python |
| 111 | +gb.mean() |
| 112 | +``` |
| 113 | + |
| 114 | +``` |
| 115 | +<xarray.DataArray 'foo' (x_bins: 2, letters: 2, y: 3)> Size: 96B |
| 116 | +array([[[ 0., 1., 2.], |
| 117 | + [nan, nan, nan]], |
| 118 | +
|
| 119 | + [[nan, nan, nan], |
| 120 | + [ 3., 4., 5.]]]) |
| 121 | +Coordinates: |
| 122 | + * x_bins (x_bins) object 16B (5, 15] (15, 25] |
| 123 | + * letters (letters) object 16B 'a' 'b' |
| 124 | +Dimensions without coordinates: y |
| 125 | +``` |
0 commit comments