-
Notifications
You must be signed in to change notification settings - Fork 757
Description
New feature
Include source context (process name or output name) and channel value in FilePublishEvent, allowing observers to filter and access metadata without manual event correlation.
Use case
When building plugins that notify external systems about published files (e.g., triggering an API when BAM files are ready), the notification typically needs:
- The published file path
- A selector to filter which publications matter (process name or output name)
- Associated metadata (sample ID, patient info, etc.)
Currently, FilePublishEvent only provides source, target, and labels. Getting the selector or metadata requires complex workarounds:
For publishDir files:
- Track task submissions/completions via onTaskSubmit/onTaskComplete
- Extract task hash by parsing the work directory path structure
- Maintain hash → metadata maps with synchronization for race conditions
For workflow output files:
- Buffer FilePublishEvents until onWorkflowOutput fires
- Retrospectively correlate files to channel values
- No direct link between the two event types
This results in ~400 lines of boilerplate for what could be a simple notification plugin.
Suggested implementation
Both publishDir and workflow outputs have a natural selector and associated channel value:
| Source | Selector | Channel Value |
|---|---|---|
| publishDir | Process name (ALIGNMENT) | Task output tuple |
| Workflow output | Output name ("samples" in docs example) | Published channel value |
Extend FilePublishEvent to include this context:
class FilePublishEvent {
Path source
Path target
List<String> labels
// Source context:
PublishSource sourceType // PROCESS or WORKFLOW_OUTPUT
String sourceName // "ALIGNMENT" or "alignedBams"
// The channel value containing this file:
Object value
}
enum PublishSource {
PROCESS,
WORKFLOW_OUTPUT
}Multi-file channel values
Nextflow channels naturally group related items together - a BAM file with its index, paired-end reads, or a sample's outputs across multiple formats. This co-location is semantically meaningful: items in the same emission are related.
When a channel emission contains multiple files, each file currently generates its own event:
workflow {
main:
// science goes here
publish:
alignedBams = ALIGNMENT.out.map { meta, bam, bai ->
meta + [bamFile: bam, index: bai]
}
}This would produce two events for each sample, which is a shame because their presence together as cognate items in the channel strongly implies they are related:
| Event | target | value |
|---|---|---|
| 1 | results/sampleA.bam | [sampleId: "sampleA", bamFile: sampleA.bam, index: sampleA.bam.bai] |
| 2 | results/sampleA.bam.bai | [sampleId: "sampleA", bamFile: sampleA.bam, index: sampleA.bam.bai] |
Without the channel value, observers must reassemble this relationship - correlating events by parsing filenames or tracking state. Including the full value preserves the semantic grouping that the pipeline author intended:
- Related files (BAM + index) can be identified as siblings from the same emission
- All metadata is available without reconstruction
- The observer doesn't need to guess which files belong together
Simplified observer implementation
void onFilePublish(FilePublishEvent event) {
if (!matchesConfig(event.sourceName)) return
sendNotification([
file: event.target,
source: event.sourceName,
sourceType: event.sourceType,
metadata: event.value
])
}