Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rolling aggregate support based on windows within a DT #1500

Open
atroiano opened this issue Dec 20, 2018 · 5 comments
Open

Rolling aggregate support based on windows within a DT #1500

atroiano opened this issue Dec 20, 2018 · 5 comments
Labels
cust-goldmansachs new feature Feature requests for new functionality

Comments

@atroiano
Copy link

atroiano commented Dec 20, 2018

I'd like to see the ability to get different rolling aggregations of my dataset based on order and grouping columns. Pandas has robust support for these type of actions. https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rolling.html.
It would also be nice to easily assign these to new columns without having to make nested for loops.
Below is a function I use in Pandas to achieve this functionality

def create_roll_columns (x,g_c,roll,roll_cols,roll_types):
    #roll types [sum,mean,min]
    #g_c needs to be a list
    #lag will set the number of lag cols
    for i in roll_cols:
        for j in range(2,roll+1):
            for aggregation in roll_types:
                if aggregation == 'sum':
                    nm = '{0}_sum_{1}'.format(i, str(j))
                    x[nm] = x.groupby(g_c)[i].rolling(j, min_periods=1).sum().reset_index(0,drop=True)
                if aggregation == 'mean':
                    nm = '{0}_mean_{1}'.format(i, str(j))
                    x[nm] = x.groupby(g_c)[i].rolling(j, min_periods=1).mean().reset_index(0,drop=True)
                if aggregation == 'min':
                    nm = '{0}_min_{1}'.format(i, str(j))
                    x[nm] = x.groupby(g_c)[i].rolling(j, min_periods=1).min().reset_index(0,drop=True)              
    return(x)  

Example code could be something like - df(select=mean(f.x), group="y",window = 3, align='r')
-group = windowing columns for the select statement
-window = many rows is looks back or forward in a given group

  • Align is right, left, center and coordinates where the window happens relative to the given record.
@atroiano atroiano changed the title Lead/Lag and rolling aggregate support based on windows Lead/Lag and rolling aggregate support based on windows within a DT Dec 20, 2018
@st-pasha st-pasha added the improve Improvement of an existing functionality label Dec 20, 2018
@st-pasha st-pasha self-assigned this Dec 20, 2018
@st-pasha
Copy link
Contributor

I'm going to remove lead/lag from here, since it is a separate feature, and deserves its own issue.

The rolling functions are an important feature to add; there is a similar work being done in R data.table as well: Rdatatable/data.table#2778

@jangorecki
Copy link
Contributor

The R data.table implementation (as of now rollmean) was designed to be R agnostic, so we should be able to reduce maintanence effort by reusing that code.

@atroiano
Copy link
Author

@st-pasha Do you want me to raise a ticket about lead/lag?

@st-pasha
Copy link
Contributor

@atroiano Yes, please do. I guess the function name will be shift(n), in concord with R data.table, and also because it nicely allows both positive and negative ns.

@st-pasha st-pasha changed the title Lead/Lag and rolling aggregate support based on windows within a DT Rolling aggregate support based on windows within a DT Dec 21, 2018
@st-pasha st-pasha added new feature Feature requests for new functionality and removed improve Improvement of an existing functionality labels Feb 19, 2019
@Viktor-Demin
Copy link

In R we use zoo or own implementation of rolling function with next syntax:

DT[, ref_price := roll(price, fill = T, width = 16, fun = max), by = .(banner, region, sku)]

Is there way to implement with similar syntax?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cust-goldmansachs new feature Feature requests for new functionality
Projects
None yet
Development

No branches or pull requests

5 participants