-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure immutability of CalcJobNode
hash before and after storing
#3130
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The `Node._get_objects_to_hash` method, which returns all the objects of a `Node` instance that should be included in computing its hash, also included the file repository. This proved problematic for the `CalcJobNode` sub class. The hash of a node is computed during the store method, in order to determine if it can be cached from an existing node with an identical hash. However, at that point, the repository of the node is empty, but as soon as the node is stored and process is started, the input files that are generated by the `CalcJob` class based on the inputs, are stored in the repository of the node. If the hash were to be recomputed at that point, it would be different from the original hash computed during storing. This breaks the caching mechanism. Since the input files that are written to the repository by the `CalcJob` are derivatives of the input nodes (which are included in the hash), and therefore do not semantically add anything to the provenance, we can simply ignore them. Since this only applies to the `CalcJobNode`, we simply override the method in that sub class and omit the repository. N.B.: in this commit we also loosen the condition of when a completed process node is considered to be a valid cache. Before only processes that finished successfully (i.e. with process state `finished` and an exit status `0`) were considered valid caches. This is loosened to also accept non-zero exit statuses. The new rule then considers all processes that have `finished` as a valid cache, excluding only `excepted` and `killed` processes.
8c4385e
to
ceb3c78
Compare
ltalirz
approved these changes
Jul 5, 2019
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @sphuber for the fix!
Merged
d-tomerini
pushed a commit
to d-tomerini/aiida_core
that referenced
this pull request
Sep 30, 2019
…iidateam#3130) The `Node._get_objects_to_hash` method, which returns all the objects of a `Node` instance that should be included in computing its hash, also included the file repository. This proved problematic for the `CalcJobNode` sub class. The hash of a node is computed during the store method, in order to determine if it can be cached from an existing node with an identical hash. However, at that point, the repository of the node is empty, but as soon as the node is stored and process is started, the input files that are generated by the `CalcJob` class based on the inputs, are stored in the repository of the node. If the hash were to be recomputed at that point, it would be different from the original hash computed during storing. This breaks the caching mechanism. Since the input files that are written to the repository by the `CalcJob` are derivatives of the input nodes (which are included in the hash), and therefore do not semantically add anything to the provenance, we can simply ignore them. Since this only applies to the `CalcJobNode`, we simply override the method in that sub class and omit the repository. N.B.: in this commit we also loosen the condition of when a completed process node is considered to be a valid cache. Before only processes that finished successfully (i.e. with process state `finished` and an exit status `0`) were considered valid caches. This is loosened to also accept non-zero exit statuses. The new rule then considers all processes that have `finished` as a valid cache, excluding only `excepted` and `killed` processes.
d-tomerini
pushed a commit
to d-tomerini/aiida_core
that referenced
this pull request
Oct 16, 2019
…iidateam#3130) The `Node._get_objects_to_hash` method, which returns all the objects of a `Node` instance that should be included in computing its hash, also included the file repository. This proved problematic for the `CalcJobNode` sub class. The hash of a node is computed during the store method, in order to determine if it can be cached from an existing node with an identical hash. However, at that point, the repository of the node is empty, but as soon as the node is stored and process is started, the input files that are generated by the `CalcJob` class based on the inputs, are stored in the repository of the node. If the hash were to be recomputed at that point, it would be different from the original hash computed during storing. This breaks the caching mechanism. Since the input files that are written to the repository by the `CalcJob` are derivatives of the input nodes (which are included in the hash), and therefore do not semantically add anything to the provenance, we can simply ignore them. Since this only applies to the `CalcJobNode`, we simply override the method in that sub class and omit the repository. N.B.: in this commit we also loosen the condition of when a completed process node is considered to be a valid cache. Before only processes that finished successfully (i.e. with process state `finished` and an exit status `0`) were considered valid caches. This is loosened to also accept non-zero exit statuses. The new rule then considers all processes that have `finished` as a valid cache, excluding only `excepted` and `killed` processes.
7 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #3125
The
Node._get_objects_to_hash
method, which returns all the objects ofa
Node
instance that should be included in computing its hash, alsoincluded the file repository. This proved problematic for the
CalcJobNode
sub class. The hash of a node is computed during the storemethod, in order to determine if it can be cached from an existing node
with an identical hash. However, at that point, the repository of the
node is empty, but as soon as the node is stored and process is started,
the input files that are generated by the
CalcJob
class based on theinputs, are stored in the repository of the node. If the hash were to be
recomputed at that point, it would be different from the original hash
computed during storing. This breaks the caching mechanism.
Since the input files that are written to the repository by the
CalcJob
are derivatives of the input nodes (which are included in the hash), and
therefore do not semantically add anything to the provenance, we can
simply ignore them. Since this only applies to the
CalcJobNode
, wesimply override the method in that sub class and omit the repository.
N.B.: in this commit we also loosen the condition of when a completed
process node is considered to be a valid cache. Before only processes
that finished successfully (i.e. with process state
finished
and anexit status
0
) were considered valid caches. This is loosened to alsoaccept non-zero exit statuses. The new rule then considers all processes
that have
finished
as a valid cache, excluding onlyexcepted
andkilled
processes.