Description
Is your feature request related to a problem?
It is pretty common to want to run cumsum
and have the sum reset when a boolean flag array is 1
. This is so common it has its own Wikipedia page and is discussed in Blelloch (1993) (Section 1.5)
Here's a real example of someone trying to implement it in a fairly roundabout way.
time_cumsum = cube.cumsum(dim = 'time')
cumsum = time_cumsum - time_cumsum.where(cube== 0).ffill(dim = 'time').fillna(0)
We have a few options to implement it:
-
We could introduce a new method
DataArray.segmented_scan(flags, op="sum")
or a new classDataArray.segment.cumsum()
? A dask/cubed friendly version that does all of this in a single scan should be fairly straightforward to write (and similar to ourffill
,bfill
wrappers). -
In a way this generalizes
resample
and it just struck me that the example above could be written as the following, which should be OK once flox adds scansgroup_idx = (cube == 0).cumsum('time') cubed.groupby(group_idx).cumsum()
- We could use our new
Grouper
functionality to expose a "flag" grouper that hides thegroup_idx = (cube == 0).cumsum('time')
line.
- We could use our new
My concern with (2) and (2.i) is that they are not at all obvious for most of our userbase.