-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Having multiple Dataset
s ending in parallel tasks doesn't work
#985
Comments
gnn
added a commit
that referenced
this issue
Oct 19, 2022
In order to reproduce the issue, the test creates two `Dataset`s ending in parallel tasks and sets their `__module__` attribute to a module below the `egon.data.datasets` package to simulate having the `Dataset`s created there. Then the test checks that the ids of the final `Dataset` tasks are distinct.
nailend
added a commit
that referenced
this issue
Nov 1, 2022
gnn
added a commit
that referenced
this issue
Nov 1, 2022
Merge the "fixes/#985-create-unique-ids-on-generated-tasks" branch into the "continuous-integration/run-everything-2022-11-01" branch so that the fix can pass its final test in order to move into the "dev" branch.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
More specifically, having multiple
Dataset
s ending in parallel tasks doesn't work if thoseDataset
s are in a module below theegon.data.datasets
package. In that case the code removing the module name prefix from task ids and the code generating the final dataset task which updates the dataset version once all parallel tasks have finished interact in a way that generates non-distinct task ids so that tasks generated later clobber the ones generated earlier. This leads to spurious cycles and other inconsistencies and bugs in the graph.The text was updated successfully, but these errors were encountered: