Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datetime Index support for sklego.pandas_utils.add_lags [FEATURE] #428

Open
1 of 3 tasks
AlanGanem opened this issue Dec 22, 2020 · 2 comments
Open
1 of 3 tasks

Datetime Index support for sklego.pandas_utils.add_lags [FEATURE] #428

AlanGanem opened this issue Dec 22, 2020 · 2 comments
Labels
enhancement New feature or request

Comments

@AlanGanem
Copy link

AlanGanem commented Dec 22, 2020

Please explain clearly what you'd like to see added.

When working with Hierarchichal and Groupped time series, i've stumbled uppon some common issues:

  1. usually for GTS and HTS, the time series for all groups are in the same dataframe, requiring the shift operations to be performed in a groupwise fashion.
  2. there might be some missing data for specific days, which makes the naive operation for lagged (using pd.shift) features incossistent.
  3. you may want to create resampled (downsampled) lagged features, preserving the default data frequency (like having daily prediciton using last week mean as a feature, for instance)

I already have this implemented as a function.

  • convince us of the use-case, we're open to many suggestions but we prefer to solve problems with pipelines that are at least somewhat general
  • add a screenshot if applicable (ML stuff is hard to explain with words, pictures say 1000 words)
  • make sure that the feature you want is not already supported by sklearn
@AlanGanem AlanGanem added the enhancement New feature or request label Dec 22, 2020
@AlanGanem AlanGanem changed the title Datetime Index features for sklego.pandas_utils.add_lags [FEATURE] Datetime Index support for sklego.pandas_utils.add_lags [FEATURE] Dec 22, 2020
@koaning
Copy link
Owner

koaning commented Dec 22, 2020

This sounds like an imputation combined with our GroupedTransformer. I'm not 100% sure if the transformer has any notion of hierarchy. I do know that our grouped predictor does have this feature, see shrinkage param in API.

@AlanGanem
Copy link
Author

It indeed resembles part of the GroupedTransformer funcitonality, although i'm not sure how that'd work with "callable transformers" instead of objects containning fit and transform methods (maybe a simple wrapper would suffice in this case).
Still, the remainning features i believe are not avalible in the current implementation of sklego.pandas_utils.add_lags are:

  1. The imputation part is a bit different, since the inputations are row inputations (non existing date rows) and not value inputation (in the sense of filling out the NaNs). Resampling to a desired frequency and filling the gaps does the job.
  2. There's also the different time frequency (downsampled) features. I'm not sure if that'd be generally usefull, but it did help me a lot in time series feature engineering, alongside with rolling operations.

About the shrinkage parameter, it looks very intresting! I've just mentioned Hierarchichal Time series as a use case for the feature request. It does not explicitly handle hierarchies, only the hierarchy "leaves" are naturally taken into account by the groupwise tranformation.

Do these featues still make sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants