-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minimal implementation of base restart workchain #3748
Minimal implementation of base restart workchain #3748
Conversation
62693d6
to
29b9836
Compare
The calculation job class used exit codes in the range `100 - 199` which are reserved for scheduler errors. They have been changed to be in the three hundred range.
29b9836
to
8472e51
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @sphuber.
I tested this functionality in the raspa plugin (lsmo-epfl/aiida-raspa#56) - works fine.
I also left some minor comments, suggestions.
The `BaseRestartWorkChain` is designed to help the writing of a base workchain that wraps the launching of a sub process, for example a calculation job, and provides a framework for easily adding automated error handling and performing sanity checks. The class and its utilities were originally implemented in the `aiida-quantumespresso` plugin but were designed from the get-go to be generically applicable. It was quickly adopted and used in other plugins. Therefore we are now moving it to `aiida-core` so that it can be useful to many plugins without separate copies of the code having to be maintained. Originally it was only designed to wrap around calculation jobs, but the concept is generic enough that here it is generalized to any sub process, such as `WorkChains`. The basic concept is simple, instead of subclassing from `WorkChain` one subclasses the `BaseRestartWorkChain` and uses at a minimum the outline: cls.setup while_(cls.should_run_process)( cls.run_process, cls.inspect_process, ), cls.results The logic of those outline methods is implemented on the base class so the only thing that remains is to specify the process sub class that needs to be used. The sub process will be launched until it is sucessful, i.e. exit status 0, or the maximum number of iterations is exceeded. The `inspect_process` will loop over the list of registered process handlers. The handlers can check for errors in case that the sub process failed, or perform sanity checks even if the sub process was successful. The handlers can return a `ProcessHandlerReport` to control the further flow, for example by breaking out of the process handler call loop or even completely aborting the workchain if an unrecoverable problem was encountered.
To clarify the `register_process_handler` decorator, the `priority` argument is made a keyword-only argument such that callers are forced to specify it. Having just an integer in the method arguments can be confusing as to its meaning. This commit also adds type checking to the `priority` keyword.
The `exit_codes` argument takes a single or list of `ExitCode` instances. If defined, the handler will return `None` if the exit code set on the `node` does not appear in the `exit_codes`. This is useful to have a handler called only when the process failed with a specific (set of) exit code(s).
8472e51
to
a2e9f09
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @sphuber! For me, this is good to go.
Fixes #3644
This is an alternative PR to #3654 that removes two features that are still contentious:
The other PR contained a first implementation of these features but since the provenance is easily lost when using these constructs, we postpone integration until later when we have had more time to investigate them. Since they will be purely additive features they can be added later without affecting the existing interface.