Skip to content

Feature request: Allow customizable task hashing #6683

@kaizhang

Description

@kaizhang

New feature

During workflow development, minor changes to a process script (e.g., adding debug echoes, refactoring commands, or commenting lines) frequently invalidate the task cache, even when inputs and outputs remain functionally identical. This forces unnecessary re-execution of tasks on every -resume run, significantly slowing down iterative development.

It would be very useful to have a way to customize how task hashes are computed, particularly to optionally exclude certain components (like the process script content) from the hash calculation in specific scenarios.

Use case

  • Rapid prototyping and debugging of pipelines.
  • Prevents re-running long tasks just because a cat or echo was added for inspection.
  • Complements existing caching mechanisms without breaking reproducibility in final runs.

Suggested implementation

Introduce a new process directive (or extend the existing cache directive) that allows users to control which elements contribute to the task hash. Examples:

process myProcess {
    cache { task ->
        // Return a custom object/map that contributes to the hash
        [name: task.process, inputs: task.inputs]  // explicitly exclude script, container, etc.
    }
    // ...
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions