Skip to content

Support segmented scans #9229

Open
Open
@dcherian

Description

@dcherian

Is your feature request related to a problem?

It is pretty common to want to run cumsum and have the sum reset when a boolean flag array is 1. This is so common it has its own Wikipedia page and is discussed in Blelloch (1993) (Section 1.5)

Here's a real example of someone trying to implement it in a fairly roundabout way.

time_cumsum = cube.cumsum(dim = 'time')
cumsum = time_cumsum - time_cumsum.where(cube== 0).ffill(dim = 'time').fillna(0)

We have a few options to implement it:

  1. We could introduce a new method DataArray.segmented_scan(flags, op="sum") or a new class DataArray.segment.cumsum()? A dask/cubed friendly version that does all of this in a single scan should be fairly straightforward to write (and similar to our ffill, bfill wrappers).

  2. In a way this generalizes resample and it just struck me that the example above could be written as the following, which should be OK once flox adds scans

    group_idx = (cube == 0).cumsum('time')
    cubed.groupby(group_idx).cumsum()
    1. We could use our new Grouper functionality to expose a "flag" grouper that hides the group_idx = (cube == 0).cumsum('time') line.

My concern with (2) and (2.i) is that they are not at all obvious for most of our userbase.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions