Skip to content

MNT: Reimplement file-hash caching in new hashing framework #683

Closed
@tclose

Description

@tclose

A casualty of the new hashing framework introduced by pydra#662 was the removal of file-hash caching (only calculating the file-hash once per task. For large files this could be a significant performance regression so it would be good to work out how to add it back in.

Suggestions

  1. Just cache the checksum in the task object (if we place guards on it changing post-execution, see pydra:#681, this might be sufficient)
  2. Return file mtime as part of bytes_repr and use this to create a local cache. This mapping could be potentially stored on disk for persistence between runs

Metadata

Metadata

Assignees

No one assigned

    Labels

    maintenanceRefactors and improvements to code quality.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions