Datetime Index support for sklego.pandas_utils.add_lags [FEATURE] #428

AlanGanem · 2020-12-22T14:44:51Z

Please explain clearly what you'd like to see added.

When working with Hierarchichal and Groupped time series, i've stumbled uppon some common issues:

usually for GTS and HTS, the time series for all groups are in the same dataframe, requiring the shift operations to be performed in a groupwise fashion.
there might be some missing data for specific days, which makes the naive operation for lagged (using pd.shift) features incossistent.
you may want to create resampled (downsampled) lagged features, preserving the default data frequency (like having daily prediciton using last week mean as a feature, for instance)

I already have this implemented as a function.

convince us of the use-case, we're open to many suggestions but we prefer to solve problems with pipelines that are at least somewhat general
add a screenshot if applicable (ML stuff is hard to explain with words, pictures say 1000 words)
make sure that the feature you want is not already supported by sklearn

koaning · 2020-12-22T15:59:43Z

This sounds like an imputation combined with our GroupedTransformer. I'm not 100% sure if the transformer has any notion of hierarchy. I do know that our grouped predictor does have this feature, see shrinkage param in API.

AlanGanem · 2020-12-22T17:49:03Z

It indeed resembles part of the GroupedTransformer funcitonality, although i'm not sure how that'd work with "callable transformers" instead of objects containning fit and transform methods (maybe a simple wrapper would suffice in this case).
Still, the remainning features i believe are not avalible in the current implementation of sklego.pandas_utils.add_lags are:

The imputation part is a bit different, since the inputations are row inputations (non existing date rows) and not value inputation (in the sense of filling out the NaNs). Resampling to a desired frequency and filling the gaps does the job.
There's also the different time frequency (downsampled) features. I'm not sure if that'd be generally usefull, but it did help me a lot in time series feature engineering, alongside with rolling operations.

About the shrinkage parameter, it looks very intresting! I've just mentioned Hierarchichal Time series as a use case for the feature request. It does not explicitly handle hierarchies, only the hierarchy "leaves" are naturally taken into account by the groupwise tranformation.

Do these featues still make sense?

AlanGanem added the enhancement New feature or request label Dec 22, 2020

AlanGanem changed the title ~~Datetime Index features for sklego.pandas_utils.add_lags [FEATURE]~~ Datetime Index support for sklego.pandas_utils.add_lags [FEATURE] Dec 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datetime Index support for sklego.pandas_utils.add_lags [FEATURE] #428

Datetime Index support for sklego.pandas_utils.add_lags [FEATURE] #428

AlanGanem commented Dec 22, 2020 •

edited

Loading

koaning commented Dec 22, 2020

AlanGanem commented Dec 22, 2020

Datetime Index support for sklego.pandas_utils.add_lags [FEATURE] #428

Datetime Index support for sklego.pandas_utils.add_lags [FEATURE] #428

Comments

AlanGanem commented Dec 22, 2020 • edited Loading

Please explain clearly what you'd like to see added.

koaning commented Dec 22, 2020

AlanGanem commented Dec 22, 2020

AlanGanem commented Dec 22, 2020 •

edited

Loading