-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix: Adjust evaluation business logic (#535)
# Motivation - We want to use the `real_last_timestamp` (start of next training interval - 1 --> marking end of current training interval) only for plotting the boxes in the heatmap plot. For decisions w.r.t. currently active models this doesn't for. E.g. if the next year in a dataset has no data at the year start, our training interval would extend into this next year and therefore it's model won't be considered for the current evaluation interval. - timetriggers should allow starting a statically defined start point (and not with the first sample), otherwise, the whole schedule is off a couple of days. # Note Independently of the bug we fix with the `start_timestamp` in `timetrigger`, this setting allows us to effectively do `pre-training` with the first trigger (e.g. in the continous arxiv dataset we have sparse data from 1988 - ~2005). We could simply start the schedule in year 2005, then the first trigger trains on all previous data.
- Loading branch information
1 parent
c0b0ae8
commit beb3769
Showing
6 changed files
with
57 additions
and
38 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
import pandas as pd | ||
|
||
|
||
def generate_real_training_end_timestamp(df_trainings: pd.DataFrame) -> pd.Series: | ||
""" | ||
For sparse datasets we want to use next_training_start-1 as training interval end instead of last_timestamp | ||
as there could be a long gap between the max(sample_time) in one training batch and the min(sample_time) in | ||
the next training batch. | ||
e.g. if we want to train for 1.1.2020-31.12.2020 but only have timestamps on 1.1.2020, last_timestamp | ||
would be 1.1.2020, but the next training would start on 1.1.2021. | ||
Args: | ||
df_trainings: The pipeline stage execution tracking information including training and model infos. | ||
Returns: | ||
The real last timestamp series. | ||
""" | ||
return df_trainings["first_timestamp"].shift(-1, fill_value=df_trainings.iloc[-1]["last_timestamp"] + 1) - 1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters