- 
                Notifications
    
You must be signed in to change notification settings  - Fork 488
 
Refactor postvariantcalling. Split out valrociraptor vs other options #2043
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
| 
          
 Warning Newer version of the nf-core template is available. Your pipeline is using an old version of the nf-core template: 3.3.2. For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good
| 
           Waiting for conlcusion on discussion here before updating all the checksums: https://nfcore.slack.com/archives/C05V9FRJYMV/p1761906884846409?thread_ts=1761564791.923049&cid=C05V9FRJYMV  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments
| // Set Varlociraptor reference files | ||
| varlociraptor_scenario_germline = params.varlociraptor_scenario_germline ? Channel.fromPath(params.varlociraptor_scenario_germline).map { it -> [[id: it.baseName - '.yte'], it] }.collect() : Channel.fromPath("${projectDir}/assets/varlociraptor_germline.yte.yaml").collect() | ||
| varlociraptor_scenario_somatic = params.varlociraptor_scenario_somatic ? Channel.fromPath(params.varlociraptor_scenario_somatic).map { it -> [[id: it.baseName - '.yte'], it] }.collect() : Channel.fromPath("${projectDir}/assets/varlociraptor_somatic.yte.yaml").collect() | ||
| varlociraptor_scenario_tumor_only = params.varlociraptor_scenario_tumor_only ? Channel.fromPath(params.varlociraptor_scenario_tumor_only).map { it -> [[id: it.baseName - '.yte'], it] }.collect() : Channel.fromPath("${projectDir}/assets/varlociraptor_tumor_only.yte.yaml").collect() | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put that directly on the call to Sarek workflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as you wish, thought it would be more new syntaxy to do it in the main?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
basically this follows where currently the intervals and annotation files are defined.
| { assert snapshot( | ||
| workflow.out.vcfs, | ||
| workflow.out.tbis, | ||
| workflow.out.versions | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we capture the yaml output instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hold on what is the difference? this just snapshots the yaml, isn't that what we want? got an example of what you want instead?
Previously, the post-variant calling logic allowed mixing different post-processing strategies (varlociraptor,
normalization, concatenation) in confusing ways. The main workflow had complex conditional logic to determine which VCFs was used based on what post-processing was requested, a lot of computation was repeated on both raw and post-processed variants. This made the data flow hard to follow and error-prone.
Changes
Enforced either-or logic in post-variant calling:
Simplified main workflow:
Additional improvements:
Resume limitations with varlociraptor
While the overall data flow is cleaner, varlociraptor is currently not reliable resuming. I marked the likely location where this occurs but it is currently not clear to me why.