-
Notifications
You must be signed in to change notification settings - Fork 2
Restarting process #458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restarting process #458
Conversation
By flag, you can cloudpickle results; if already running and the results file exists it's loaded instead of running; the results file gets cleaned up either way. This is really only _useful_ when the node is running on an executor process that _doesn't_ die when the parent process dies.
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferencesCodacy stopped sending the deprecated coverage status on June 5th, 2024. Learn more |
Pull Request Test Coverage Report for Build 10948924861Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
Superseded by #476 |
To work with long-duration nodes on executors that survive the shutdown of the parent workflow/node python process (e.g.
executorlib
using slurm), we need to be able to tell the run paradigm to serialize the results, and to try to load such a serialization if we come back and the node is running.This introduces new attributes
Node.serialize_results
to trigger the result serialization, and a privateNode._do_clean
to let power users (i.e. me writing the unit tests) stop the serialized results from getting cleaned up automatically at read-time.Under the hood,
Node
now directly implementsRunnable.on_run
andRunnable.run_args
leveraging the new detached path from #457 to make sure that each run has access to a semantically relevant path for writing the temporary output file (using cloudpickle). Child classes ofNode
implement new abstract methodsNode._on_run
andNode._run_args
in place of the previousRunnable
abstract methods they implemented.TODO:
Figure out how to get the node/parent workflow to checkpoint itself after it has set its status to "running" when it is going to rely on serializationrebase this ontomain
; since breaking theRunnable.run
cycle down, getting a right-before-running-checkpoint should be quite easy