[BUG] Rolling window aggregations are very slow with large windows

With large windows, the `.rolling()` function in cuDF can be pathologically slow:

```python
In [6]: dt = cudf.date_range("2001-01-01", "2002-01-01", freq="1s")
In [7]: df = cudf.DataFrame({"x": np.random.rand(len(dt))}, index=dt)
In [8]: %time df.rolling("1D").sum()
CPU times: user 10.3 s, sys: 57.1 ms, total: 10.3 s
Wall time: 10.4 s
Out[8]:
                                x
2001-01-01 00:00:00      0.815418
2001-01-01 00:00:01      1.238151
2001-01-01 00:00:02      1.811390
2001-01-01 00:00:03      2.065794
2001-01-01 00:00:04      2.195230
...                           ...
2001-12-31 23:59:55  43308.909704
2001-12-31 23:59:56  43309.098228
2001-12-31 23:59:57  43308.658888
2001-12-31 23:59:58  43308.790256
2001-12-31 23:59:59  43308.915838

[31536000 rows x 1 columns]
```

## Why is it slow?

Of the 10s of execution time above, about 8s is spent in computing the window sizes, which is done in a hand-rolled numba CUDA kernel: https://github.com/rapidsai/cudf/blob/6f6e521257dce5732eea7b6b9d56243f8b0a69cc/python/cudf/cudf/utils/cudautils.py#L17. Note that running the code through a profiler will show execution time being spent in the _next_ CUDA kernel (`column.full`) - but that's a red herring I think, because there's no synchronization after the numba kernel call.

## What can we do about it?

I see a couple of options here:

1. I wonder if there's a better way to write that kernel. Currently, it naively launches one thread per element, and does a linear search for the next element that would exceed the window bounds. 
2. We could make it `libcudf`'s responsibility to compute the window sizes. I believe they already do window sizes computation in the context of _grouped_ rolling window aggreagations: see [`grouped_range_rolling_window()`](https://github.com/rapidsai/cudf/blob/6f6e521257dce5732eea7b6b9d56243f8b0a69cc/cpp/include/cudf/rolling.hpp#L542).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Rolling window aggregations are very slow with large windows #15119

Why is it slow?

What can we do about it?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development