Skip to content

[BUG] Correcly set lagged variables to known when lag >= horizon #1910

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jul 10, 2025

Conversation

hubkrieb
Copy link
Contributor

@hubkrieb hubkrieb commented Jul 1, 2025

Reference Issues/PRs

Fixes #1909

What does this implement/fix? Explain your changes.

This PR fixes how the lagged variables are assigned to known or unknown variables by setting the ones originating from known variables to known and the others to known only if lag >= horizon to avoid data leaks

What should a reviewer concentrate their feedback on?

Did you add any tests for the change?

I just checked it with the issue code snippet for now, but it might need a unit test to avoid changing its behaviour again

Any other comments?

PR checklist

  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.
  • Added/modified tests
  • Used pre-commit hooks when committing to ensure that code is compliant with hooks. Install hooks with pre-commit install.
    To run hooks independent of commit, execute pre-commit run --all-files

Copy link

codecov bot commented Jul 1, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (main@b8dbacc). Learn more about missing BASE report.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1910   +/-   ##
=======================================
  Coverage        ?   86.36%           
=======================================
  Files           ?       96           
  Lines           ?     7801           
  Branches        ?        0           
=======================================
  Hits            ?     6737           
  Misses          ?     1064           
  Partials        ?        0           
Flag Coverage Δ
cpu 86.36% <100.00%> (?)
pytest 86.36% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, well spotted!

If I may ask, how did you notice this?

@hubkrieb
Copy link
Contributor Author

hubkrieb commented Jul 3, 2025

I had first implemented lagged features manually in my input data for one of my project and was worried about data leakage. When I noticed there was a lags parameter in TimeSeriesDataset I checked how they were handled and it didn't look right and coherent with the existing comment.

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see! Could we add that as a test then?

Given that we consider this a bug, we should add a test to prevent this from occurring later again.

@hubkrieb
Copy link
Contributor Author

hubkrieb commented Jul 4, 2025

I just added a test to check the different cases mentioned in the tables of #1909

The only thing is I couldn't have a test that worked for a real variable that is not the target. I think it is because of the bug mentioned in #1587

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@fkiraly
Copy link
Collaborator

fkiraly commented Jul 9, 2025

Review of this would be appreciated, @fnhirwa, @phoeenniixx, @PranavBhatP

Copy link
Contributor

@PranavBhatP PranavBhatP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good!

Copy link
Member

@fnhirwa fnhirwa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me😊

@fnhirwa fnhirwa merged commit 4af8ed5 into sktime:main Jul 10, 2025
35 checks passed
@github-project-automation github-project-automation bot moved this from Under review to Fixed/resolved in Bugfixing - pytorch-forecasting Jul 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working module:datasets&dataloaders
Projects
Status: Fixed/resolved
Development

Successfully merging this pull request may close these issues.

[BUG] Lagged variables not properly set to known variables when lag >= horizon
4 participants