-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Adjust evaluation business logic #535
Conversation
import pandas as pd | ||
|
||
|
||
def generate_real_training_end_timestamp(df_trainings: pd.DataFrame) -> pd.Series: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently unused, will be used in analytics tool
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #535 +/- ##
==========================================
- Coverage 82.49% 82.46% -0.04%
==========================================
Files 215 216 +1
Lines 10054 10059 +5
==========================================
+ Hits 8294 8295 +1
- Misses 1760 1764 +4 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I guess in most cases one should set the start timestamp to 0, if they want a neat triggering boundary. But I guess finding the correct setting here is the task of the user.
i will wait for integrationtests to run through for this one and then merge |
For continuous datasets with periodic evaluation not necessarily. There you don't really care where it starts and what the exact bounds are, you just care about time proximity. But yes, for the current slicing approaches we need to set it! |
@MaxiBoether could you kick off a cglm pipeline after merging this? I need a pipeline log file with correct training intervals (not affected by the first sample) |
Motivation
real_last_timestamp
(start of next training interval - 1 --> marking end of current training interval) only for plotting the boxes in the heatmap plot. For decisions w.r.t. currently active models this doesn't for. E.g. if the next year in a dataset has no data at the year start, our training interval would extend into this next year and therefore it's model won't be considered for the current evaluation interval.Note
Independently of the bug we fix with the
start_timestamp
intimetrigger
, this setting allows us to effectively dopre-training
with the first trigger (e.g. in the continous arxiv dataset we have sparse data from 1988 - ~2005). We could simply start the schedule in year 2005, then the first trigger trains on all previous data.