Skip to content

[BUG] Rolling window aggregations are very slow with large windows #15119

Open
@shwina

Description

With large windows, the .rolling() function in cuDF can be pathologically slow:

In [6]: dt = cudf.date_range("2001-01-01", "2002-01-01", freq="1s")
In [7]: df = cudf.DataFrame({"x": np.random.rand(len(dt))}, index=dt)
In [8]: %time df.rolling("1D").sum()
CPU times: user 10.3 s, sys: 57.1 ms, total: 10.3 s
Wall time: 10.4 s
Out[8]:
                                x
2001-01-01 00:00:00      0.815418
2001-01-01 00:00:01      1.238151
2001-01-01 00:00:02      1.811390
2001-01-01 00:00:03      2.065794
2001-01-01 00:00:04      2.195230
...                           ...
2001-12-31 23:59:55  43308.909704
2001-12-31 23:59:56  43309.098228
2001-12-31 23:59:57  43308.658888
2001-12-31 23:59:58  43308.790256
2001-12-31 23:59:59  43308.915838

[31536000 rows x 1 columns]

Why is it slow?

Of the 10s of execution time above, about 8s is spent in computing the window sizes, which is done in a hand-rolled numba CUDA kernel:

def gpu_window_sizes_from_offset(arr, window_sizes, offset):
. Note that running the code through a profiler will show execution time being spent in the next CUDA kernel (column.full) - but that's a red herring I think, because there's no synchronization after the numba kernel call.

What can we do about it?

I see a couple of options here:

  1. I wonder if there's a better way to write that kernel. Currently, it naively launches one thread per element, and does a linear search for the next element that would exceed the window bounds.
  2. We could make it libcudf's responsibility to compute the window sizes. I believe they already do window sizes computation in the context of grouped rolling window aggreagations: see grouped_range_rolling_window().

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    PerformancePerformance related issuebugSomething isn't workinglibcudfAffects libcudf (C++/CUDA) code.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions