Skip to content

Conversation

@FriederikeHanssen
Copy link
Contributor

@FriederikeHanssen FriederikeHanssen commented Oct 30, 2025

Previously, the post-variant calling logic allowed mixing different post-processing strategies (varlociraptor,
normalization, concatenation) in confusing ways. The main workflow had complex conditional logic to determine which VCFs was used based on what post-processing was requested, a lot of computation was repeated on both raw and post-processed variants. This made the data flow hard to follow and error-prone.

Changes

Enforced either-or logic in post-variant calling:

  • If varlociraptor is requested → process through varlociraptor workflows exclusively
  • Else if concatenate_vcfs or normalize_vcfs → perform standard post-processing
  • Else → pass through original VCFs unchanged

Simplified main workflow:

  • Removed conditional VCF gathering logic from main SAREK workflow
  • POST_VARIANTCALLING now always outputs VCFs (either processed or pass-through)
  • Annotation always consumes POST_VARIANTCALLING.out.vcfs
  • VCF QC now runs exclusively on raw variant calls (before any post-processing)

Additional improvements:

  • Standardized varlociraptor scenario file handling in main.nf
  • Disabled redundant publishing of intermediate normalized files
  • Increase time out for MultiQC to ensure all plots are published
  • Varlociraptor subworkflows now emit separate vcf and tbi outputs
  • This matches the pattern used by other post-processing subworkflows
  • Added branching logic to handle single chunks without concatenation

Resume limitations with varlociraptor

While the overall data flow is cleaner, varlociraptor is currently not reliable resuming. I marked the likely location where this occurs but it is currently not clear to me why.

@nf-core-bot
Copy link
Member

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.3.2.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

Copy link
Member

@maxulysse maxulysse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good

@FriederikeHanssen
Copy link
Contributor Author

Waiting for conlcusion on discussion here before updating all the checksums: https://nfcore.slack.com/archives/C05V9FRJYMV/p1761906884846409?thread_ts=1761564791.923049&cid=C05V9FRJYMV

@FriederikeHanssen FriederikeHanssen marked this pull request as ready for review November 3, 2025 18:34
Copy link
Member

@maxulysse maxulysse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

Comment on lines +171 to +174
// Set Varlociraptor reference files
varlociraptor_scenario_germline = params.varlociraptor_scenario_germline ? Channel.fromPath(params.varlociraptor_scenario_germline).map { it -> [[id: it.baseName - '.yte'], it] }.collect() : Channel.fromPath("${projectDir}/assets/varlociraptor_germline.yte.yaml").collect()
varlociraptor_scenario_somatic = params.varlociraptor_scenario_somatic ? Channel.fromPath(params.varlociraptor_scenario_somatic).map { it -> [[id: it.baseName - '.yte'], it] }.collect() : Channel.fromPath("${projectDir}/assets/varlociraptor_somatic.yte.yaml").collect()
varlociraptor_scenario_tumor_only = params.varlociraptor_scenario_tumor_only ? Channel.fromPath(params.varlociraptor_scenario_tumor_only).map { it -> [[id: it.baseName - '.yte'], it] }.collect() : Channel.fromPath("${projectDir}/assets/varlociraptor_tumor_only.yte.yaml").collect()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put that directly on the call to Sarek workflow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as you wish, thought it would be more new syntaxy to do it in the main?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically this follows where currently the intervals and annotation files are defined.

{ assert snapshot(
workflow.out.vcfs,
workflow.out.tbis,
workflow.out.versions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we capture the yaml output instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the versions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hold on what is the difference? this just snapshots the yaml, isn't that what we want? got an example of what you want instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants