Skip to content

[BUG] Python groupby rolling aggregations return index inconsistent with pandas #10249

Open
@beckernick

Description

@beckernick

Python groupby rolling aggregations return a single Index that corresponds to the original row position of the element, but in pandas return a MultiIndex that includes both the groupby key(s) and original row position.

This is not currently blocking any behavior with Dask + cuDF, as grouped rolling operations are blocked by #10173

import pandas as pd
import cudf
import numpy as npdf = cudf.datasets.randomdata(nrows=100000)
pdf = df.to_pandas()
​
print(pdf.groupby(['id']).rolling(window=3).x.mean().head())
print(df.groupby(['id']).rolling(window=3).x.mean().head())
id        
879  43605   NaN
881  3941    NaN
882  29855   NaN
884  14616   NaN
     70864   NaN
Name: x, dtype: float64
43605    <NA>
3941     <NA>
29855    <NA>
14616    <NA>
70864    <NA>
Name: x, dtype: float64

Metadata

Metadata

Assignees

No one assigned

    Labels

    PythonAffects Python cuDF API.bugSomething isn't workingdaskDask issuegood first issueGood for newcomers

    Type

    No type

    Projects

    Status

    No status

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions