Skip to content

Enrich FilePublishEvent with source context #6680

@robsyme

Description

@robsyme

New feature

Include source context (process name or output name) and channel value in FilePublishEvent, allowing observers to filter and access metadata without manual event correlation.

Use case

When building plugins that notify external systems about published files (e.g., triggering an API when BAM files are ready), the notification typically needs:

  1. The published file path
  2. A selector to filter which publications matter (process name or output name)
  3. Associated metadata (sample ID, patient info, etc.)

Currently, FilePublishEvent only provides source, target, and labels. Getting the selector or metadata requires complex workarounds:

For publishDir files:

  • Track task submissions/completions via onTaskSubmit/onTaskComplete
  • Extract task hash by parsing the work directory path structure
  • Maintain hash → metadata maps with synchronization for race conditions

For workflow output files:

  • Buffer FilePublishEvents until onWorkflowOutput fires
  • Retrospectively correlate files to channel values
  • No direct link between the two event types

This results in ~400 lines of boilerplate for what could be a simple notification plugin.

Suggested implementation

Both publishDir and workflow outputs have a natural selector and associated channel value:

Source Selector Channel Value
publishDir Process name (ALIGNMENT) Task output tuple
Workflow output Output name ("samples" in docs example) Published channel value

Extend FilePublishEvent to include this context:

class FilePublishEvent {
    Path source
    Path target
    List<String> labels

    // Source context:
    PublishSource sourceType   // PROCESS or WORKFLOW_OUTPUT
    String sourceName          // "ALIGNMENT" or "alignedBams"

    // The channel value containing this file:
    Object value
}

enum PublishSource {
    PROCESS,
    WORKFLOW_OUTPUT
}

Multi-file channel values

Nextflow channels naturally group related items together - a BAM file with its index, paired-end reads, or a sample's outputs across multiple formats. This co-location is semantically meaningful: items in the same emission are related.

When a channel emission contains multiple files, each file currently generates its own event:

workflow {
  main:
    // science goes here
  publish:
    alignedBams = ALIGNMENT.out.map { meta, bam, bai ->
        meta + [bamFile: bam, index: bai]
    }
}

This would produce two events for each sample, which is a shame because their presence together as cognate items in the channel strongly implies they are related:

Event target value
1 results/sampleA.bam [sampleId: "sampleA", bamFile: sampleA.bam, index: sampleA.bam.bai]
2 results/sampleA.bam.bai [sampleId: "sampleA", bamFile: sampleA.bam, index: sampleA.bam.bai]

Without the channel value, observers must reassemble this relationship - correlating events by parsing filenames or tracking state. Including the full value preserves the semantic grouping that the pipeline author intended:

  • Related files (BAM + index) can be identified as siblings from the same emission
  • All metadata is available without reconstruction
  • The observer doesn't need to guess which files belong together

Simplified observer implementation

void onFilePublish(FilePublishEvent event) {
    if (!matchesConfig(event.sourceName)) return

    sendNotification([
        file: event.target,
        source: event.sourceName,
        sourceType: event.sourceType,
        metadata: event.value
    ])
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions