Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] Aggregate unbalanced datasets #190

Conversation

AzulGarza
Copy link
Member

This PR fixes #189.
Tests were added comparing the fixes with the deprecated function aggregate_before (which works as expected in this case).

@AzulGarza AzulGarza linked an issue May 11, 2023 that may be closed by this pull request
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@AzulGarza AzulGarza requested a review from cchallu May 11, 2023 23:33
@NudnikShpilkis
Copy link

@FedericoGarza looks like the change to aggregate works, but reconcile still fails. Following my example from earlier

train_df = hier_df.query("ds <= @pd.to_datetime('2019-12-31')")
test_df = hier_df.query("ds > @pd.to_datetime('2019-12-31')")

fcst = StatsForecast(
    models=[
        sfm.AutoARIMA(season_length=12, alias='ARIMA'),
    ],
    freq='M',
    n_jobs=-1,
)

hrec = HierarchicalReconciliation(
    reconcilers=[
        hfm.BottomUp(),
        hfm.MinTrace(method='mint_shrink'),
    ]
)

fcst.fit(df=train_df)
fcst_df = fcst.forecast(h=12, fitted=True)
fitted_df = fcst.forecast_fitted_values()

fcst_df = hrec.reconcile(
    Y_hat_df=fcst_df,
    Y_df=fitted_df,
    S=S_df,
    tags=tags,
)

We get ValueError: cannot reshape array of size 64 into shape (6,newaxis) becausefitted_df have a length of 64, while S_df has a length of 6. You'll need to rewrite reconcile to address the reshaping.

@NudnikShpilkis
Copy link

Not a bug per se, but I'd clarify the exception for not including Y_df in reconcile. The type hints imply it's always optional, but it's only optional for bootstrap or permbu methods. So the exception "you need to pass Y_df" is confusing in light of the type hint.

@candalfigomoro
Copy link

My current workaround is to create a copy of train_df without the leading zeros and use that for the forecasting model. For reconcile() I still use the version with the leading zeros.

Note that aggregate() also creates all-zero time series for missing hierarchical combinations, so you also have to add all-zero forecasts to Y_hat_df/fcst_df for unique_ids artificially created by aggregate().

@AzulGarza
Copy link
Member Author

hey @NudnikShpilkis and @candalfigomoro! Thank you for your feedback! I've updated the branch to consider this case in the reconcile method. Could you try uninstalling and installing the library again from this branch? :)

Thank you!

@NudnikShpilkis
Copy link

@FedericoGarza looks like the code now runs without error. I haven't tested for validity of predictions though.

@NudnikShpilkis
Copy link

@candalfigomoro

My current workaround is to create a copy of train_df without the leading zeros and use that for the forecasting model. For reconcile() I still use the version with the leading zeros.

What's the exact formulation of your code? If you fit on the data without the leading zeros and then run forecast on data with the leading zeros it refits, doesn't it?
Where are you passing the data with the leading zeros?

…into 189-aggregate-adds-leading-zeros-to-series-with-different-dates
@AzulGarza AzulGarza merged commit c107217 into main Jun 6, 2023
@AzulGarza AzulGarza deleted the 189-aggregate-adds-leading-zeros-to-series-with-different-dates branch June 6, 2023 17:18
@mcsqr mcsqr mentioned this pull request Aug 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

aggregate adds leading zeros to series with different dates
4 participants