Skip to content

Refactor/torch datasets improvement #2798

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
May 1, 2025

Conversation

dennisbader
Copy link
Collaborator

@dennisbader dennisbader commented Apr 28, 2025

Checklist before merging this PR:

  • Mentioned all issues that this PR fixes or addresses.
  • Summarized the updates of this PR under Summary.
  • Added an entry under Unreleased in the Changelog.

Fixes #2686.

Summary

  • simplifies training and inference datasets for TorchForecastingModel
  • all training datasets now have uniform output: Tuple[past target, past cov, historic future cov, future cov, static cov, sample weight, future target]
  • all ineference datasets now have uniform output: Tuple[past target, past cov, future past cov, historic future cov, future cov, static cov, target TimeSeries, pred start time]
  • instead of having covariates specific datasets, the base datasets can now handle of covariates together. The remaining datasets are:
    • Training:
      • ShiftedTrainingDataset (old GenericShiftedDataset adapted to handle all covariates)
      • SequentialTrainingDataset (replaces old *CovariatesSequantialDataset)
      • HorizonBasedTrainingDataset (replaces old HorizonBasedDataset)
    • Prediction:
      • SequentialInfereceDataset (replaces old *CovariatesInferenceDataset)
  • simplified HorizonBasedDataset to use ShiftedTrainingDataset as parent
  • I observed a 10-15% performance boost for prediction with a model that uses all covariates

@dennisbader dennisbader requested a review from madtoinou as a code owner April 28, 2025 11:21
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link

codecov bot commented Apr 28, 2025

Codecov Report

Attention: Patch coverage is 98.20467% with 10 lines in your changes missing coverage. Please review.

Project coverage is 95.13%. Comparing base (2309556) to head (34b3c21).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
darts/utils/data/__init__.py 25.00% 6 Missing ⚠️
...arts/models/forecasting/torch_forecasting_model.py 94.80% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2798      +/-   ##
==========================================
+ Coverage   94.61%   95.13%   +0.51%     
==========================================
  Files         145      145              
  Lines       15458    15049     -409     
==========================================
- Hits        14626    14317     -309     
+ Misses        832      732     -100     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dennisbader dennisbader merged commit 06910d8 into master May 1, 2025
9 checks passed
@dennisbader dennisbader deleted the refactor/torch_datasets_improvement branch May 1, 2025 12:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Uniformise the dataset output format
2 participants