Closed
Description
A casualty of the new hashing framework introduced by pydra#662 was the removal of file-hash caching (only calculating the file-hash once per task. For large files this could be a significant performance regression so it would be good to work out how to add it back in.
Suggestions
- Just cache the checksum in the task object (if we place guards on it changing post-execution, see pydra:#681, this might be sufficient)
- Return file mtime as part of bytes_repr and use this to create a local cache. This mapping could be potentially stored on disk for persistence between runs