-
Notifications
You must be signed in to change notification settings - Fork 60
checksum_states improvements #390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… values of all files and doesn't recalculate for each element of the state
Codecov Report
@@ Coverage Diff @@
## master #390 +/- ##
==========================================
+ Coverage 82.54% 82.56% +0.01%
==========================================
Files 19 19
Lines 3827 3831 +4
Branches 1045 1047 +2
==========================================
+ Hits 3159 3163 +4
Misses 480 480
Partials 188 188
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
i do think this is where you would use a separate LRU cache, so you can reuse hashes across tasks. so i would recommend implementing a more general solution to this. |
so you want to save hashes in files? |
easiest may indeed be a single "database" file in the cache directory that can be protected against concurrent writes by softlock. but would be harder to maintain an LRU cache since items would need to be shuffled. the alternative would be to create a BaseSpec class attribute that is set before anything is initialized. i don't know the implications for pickling, how to achieve concurrent/async updates etc.,. but i suspect this must have been looked at by people in the community.
|
it depends how many things we want to fix with this PR. The part I was addressing here is run by the main node. |
@djarecka - it's fine for this PR to fix the present issue, but i would suggest at least filing an issue for improving hashing efficiency |
@satra - oh, ok. I was double checking this on openmind first and was planning to try to follow your suggestion, but it might be better in a new PR that modifies all places that use hash. will open an issue |
Acknowledgment
Types of changes
Summary
checksum_states
so it doesn't have to calculate the content hash of big files for every single element of the task state (function used inresult
), usingfiles_hash
that keeps track of all the files for the task (combining files from different state elements)Checklist
(we are using
black
: you canpip install pre-commit
,run
pre-commit install
in thepydra
directoryand
black
will be run automatically with each commit)