Minimal implementation of base restart workchain #3748

sphuber · 2020-02-10T13:38:41Z

This is an alternative PR to #3654 that removes two features that are still contentious:

Registering additional process handlers per workchain implementation through entry points
Registering additional process handlers per workchain instance launched through inputs

The other PR contained a first implementation of these features but since the provenance is easily lost when using these constructs, we postpone integration until later when we have had more time to investigate them. Since they will be purely additive features they can be added later without affecting the existing interface.

The calculation job class used exit codes in the range `100 - 199` which are reserved for scheduler errors. They have been changed to be in the three hundred range.

yakutovicha

Thanks, @sphuber.

I tested this functionality in the raspa plugin (lsmo-epfl/aiida-raspa#56) - works fine.

I also left some minor comments, suggestions.

.ci/workchains.py

.github/workflows/setup.sh

aiida/engine/processes/workchains/restart.py

aiida/engine/processes/workchains/utils.py

The `BaseRestartWorkChain` is designed to help the writing of a base workchain that wraps the launching of a sub process, for example a calculation job, and provides a framework for easily adding automated error handling and performing sanity checks. The class and its utilities were originally implemented in the `aiida-quantumespresso` plugin but were designed from the get-go to be generically applicable. It was quickly adopted and used in other plugins. Therefore we are now moving it to `aiida-core` so that it can be useful to many plugins without separate copies of the code having to be maintained. Originally it was only designed to wrap around calculation jobs, but the concept is generic enough that here it is generalized to any sub process, such as `WorkChains`. The basic concept is simple, instead of subclassing from `WorkChain` one subclasses the `BaseRestartWorkChain` and uses at a minimum the outline: cls.setup while_(cls.should_run_process)( cls.run_process, cls.inspect_process, ), cls.results The logic of those outline methods is implemented on the base class so the only thing that remains is to specify the process sub class that needs to be used. The sub process will be launched until it is sucessful, i.e. exit status 0, or the maximum number of iterations is exceeded. The `inspect_process` will loop over the list of registered process handlers. The handlers can check for errors in case that the sub process failed, or perform sanity checks even if the sub process was successful. The handlers can return a `ProcessHandlerReport` to control the further flow, for example by breaking out of the process handler call loop or even completely aborting the workchain if an unrecoverable problem was encountered.

To clarify the `register_process_handler` decorator, the `priority` argument is made a keyword-only argument such that callers are forced to specify it. Having just an integer in the method arguments can be confusing as to its meaning. This commit also adds type checking to the `priority` keyword.

The `exit_codes` argument takes a single or list of `ExitCode` instances. If defined, the handler will return `None` if the exit code set on the `node` does not appear in the `exit_codes`. This is useful to have a handler called only when the process failed with a specific (set of) exit code(s).

yakutovicha

Thanks, @sphuber! For me, this is good to go.

sphuber force-pushed the fix_3644_minimal_base_restart branch from 62693d6 to 29b9836 Compare February 10, 2020 13:53

yakutovicha self-requested a review February 10, 2020 16:29

ArithmeticAddCalculation: correct exit codes

3869a63

The calculation job class used exit codes in the range `100 - 199` which are reserved for scheduler errors. They have been changed to be in the three hundred range.

sphuber force-pushed the fix_3644_minimal_base_restart branch from 29b9836 to 8472e51 Compare February 10, 2020 17:58

yakutovicha requested changes Feb 11, 2020

View reviewed changes

sphuber added 3 commits February 11, 2020 10:28

sphuber force-pushed the fix_3644_minimal_base_restart branch from 8472e51 to a2e9f09 Compare February 11, 2020 09:28

This was referenced Feb 11, 2020

Add mechanism to "dynamically" add process handlers #3750

Closed

Add mechanism to disable statically registered process handlers #3751

Closed

Add the BaseRestartWorkChain to aiida-core #3654

Closed

yakutovicha self-requested a review February 11, 2020 09:45

yakutovicha approved these changes Feb 11, 2020

View reviewed changes

sphuber merged commit 1ba87c0 into aiidateam:develop Feb 11, 2020

sphuber deleted the fix_3644_minimal_base_restart branch February 11, 2020 09:48

yakutovicha mentioned this pull request Mar 25, 2020

Add error fixer mechanism to the Base work chain. aiidateam/aiida-cp2k#61

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minimal implementation of base restart workchain #3748

Minimal implementation of base restart workchain #3748

sphuber commented Feb 10, 2020 •

edited

Loading

yakutovicha left a comment

yakutovicha left a comment

Minimal implementation of base restart workchain #3748

Minimal implementation of base restart workchain #3748

Conversation

sphuber commented Feb 10, 2020 • edited Loading

yakutovicha left a comment

Choose a reason for hiding this comment

yakutovicha left a comment

Choose a reason for hiding this comment

sphuber commented Feb 10, 2020 •

edited

Loading