[FIX] Aggregate unbalanced datasets #190

AzulGarza · 2023-05-11T23:32:25Z

This PR fixes #189.
Tests were added comparing the fixes with the deprecated function aggregate_before (which works as expected in this case).

review-notebook-app · 2023-05-11T23:32:29Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

NudnikShpilkis · 2023-05-12T14:33:24Z

@FedericoGarza looks like the change to aggregate works, but reconcile still fails. Following my example from earlier

train_df = hier_df.query("ds <= @pd.to_datetime('2019-12-31')")
test_df = hier_df.query("ds > @pd.to_datetime('2019-12-31')")

fcst = StatsForecast(
    models=[
        sfm.AutoARIMA(season_length=12, alias='ARIMA'),
    ],
    freq='M',
    n_jobs=-1,
)

hrec = HierarchicalReconciliation(
    reconcilers=[
        hfm.BottomUp(),
        hfm.MinTrace(method='mint_shrink'),
    ]
)

fcst.fit(df=train_df)
fcst_df = fcst.forecast(h=12, fitted=True)
fitted_df = fcst.forecast_fitted_values()

fcst_df = hrec.reconcile(
    Y_hat_df=fcst_df,
    Y_df=fitted_df,
    S=S_df,
    tags=tags,
)

We get ValueError: cannot reshape array of size 64 into shape (6,newaxis) becausefitted_df have a length of 64, while S_df has a length of 6. You'll need to rewrite reconcile to address the reshaping.

NudnikShpilkis · 2023-05-12T14:42:03Z

Not a bug per se, but I'd clarify the exception for not including Y_df in reconcile. The type hints imply it's always optional, but it's only optional for bootstrap or permbu methods. So the exception "you need to pass Y_df" is confusing in light of the type hint.

candalfigomoro · 2023-05-15T12:03:23Z

My current workaround is to create a copy of train_df without the leading zeros and use that for the forecasting model. For reconcile() I still use the version with the leading zeros.

Note that aggregate() also creates all-zero time series for missing hierarchical combinations, so you also have to add all-zero forecasts to Y_hat_df/fcst_df for unique_ids artificially created by aggregate().

AzulGarza · 2023-05-17T21:44:10Z

hey @NudnikShpilkis and @candalfigomoro! Thank you for your feedback! I've updated the branch to consider this case in the reconcile method. Could you try uninstalling and installing the library again from this branch? :)

Thank you!

NudnikShpilkis · 2023-05-18T13:44:15Z

@FedericoGarza looks like the code now runs without error. I haven't tested for validity of predictions though.

NudnikShpilkis · 2023-05-23T19:25:04Z

@candalfigomoro

My current workaround is to create a copy of train_df without the leading zeros and use that for the forecasting model. For reconcile() I still use the version with the leading zeros.

What's the exact formulation of your code? If you fit on the data without the leading zeros and then run forecast on data with the leading zeros it refits, doesn't it?
Where are you passing the data with the leading zeros?

…into 189-aggregate-adds-leading-zeros-to-series-with-different-dates

fix: aggregate unbalanced datasets

b8ae1ef

AzulGarza linked an issue May 11, 2023 that may be closed by this pull request

aggregate adds leading zeros to series with different dates #189

Closed

AzulGarza requested a review from cchallu May 11, 2023 23:33

AzulGarza added 3 commits May 11, 2023 17:36

fix: recover sparse argument

30a570e

fix: install dev deps

98468f8

fix: change cells order (prevent errors)

0330884

AzulGarza added 2 commits May 17, 2023 15:25

fix: install dev deps circleci

4f42498

fix: reconcile with series of different size

9861bf0

cchallu approved these changes Jun 6, 2023

View reviewed changes

Merge branch 'main' of https://github.com/Nixtla/hierarchicalforecast …

12747dc

…into 189-aggregate-adds-leading-zeros-to-series-with-different-dates

AzulGarza merged commit c107217 into main Jun 6, 2023

AzulGarza deleted the 189-aggregate-adds-leading-zeros-to-series-with-different-dates branch June 6, 2023 17:18

mcsqr mentioned this pull request Aug 23, 2023

Fixes for large datasets #229

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FIX] Aggregate unbalanced datasets #190

[FIX] Aggregate unbalanced datasets #190

AzulGarza commented May 11, 2023

review-notebook-app bot commented May 11, 2023

NudnikShpilkis commented May 12, 2023

NudnikShpilkis commented May 12, 2023

candalfigomoro commented May 15, 2023

AzulGarza commented May 17, 2023

NudnikShpilkis commented May 18, 2023

NudnikShpilkis commented May 23, 2023

[FIX] Aggregate unbalanced datasets #190

[FIX] Aggregate unbalanced datasets #190

Conversation

AzulGarza commented May 11, 2023

review-notebook-app bot commented May 11, 2023

NudnikShpilkis commented May 12, 2023

NudnikShpilkis commented May 12, 2023

candalfigomoro commented May 15, 2023

AzulGarza commented May 17, 2023

NudnikShpilkis commented May 18, 2023

NudnikShpilkis commented May 23, 2023