Skip to content

Pipeline does not stop immediately, when non-unique samples are provided in the samplesheet #1674

Open

Description

Description of the bug

When running the nf-core/sarek pipeline with a sample sheet in which samples are not unique, the pipeline runs and fails only after the alignment step (I believe, at GATK4_MARKDUPLICATES process).

When carefully checking the documentation at nf-core/sarek, I realised that samples should be unique:

Custom sample ID for each tumor and normal sample; more than one tumor sample for each subject is possible, i.e. a tumor and a relapse; samples can have multiple lanes for which the same ID must be used to merge them later (see also lane). Sample IDs must be unique for unique biological samples

Would be useful to add a check to stop the pipeline in case non-unique sample IDs are provided.

Samplesheet used:

patient,status,sample,lane,fastq_1,fastq_2
F2407718,0,monolayer,L1,/mnt/titan/mcasanova/ctDNA/01_samples/CHUgeneve/F2407718_R1_combined.fastq.gz,/mnt/titan/mcasanova/ctDNA/01_samples/CHUgeneve/F2407718_R2_combined.fastq.gz
F2407719,0,monolayer,L1,/mnt/titan/mcasanova/ctDNA/01_samples/CHUgeneve/F2407719_R1_combined.fastq.gz,/mnt/titan/mcasanova/ctDNA/01_samples/CHUgeneve/F2407719_R2_combined.fastq.gz
F2407720,0,monolayer,L1,/mnt/titan/mcasanova/ctDNA/01_samples/CHUgeneve/F2407720_R1_combined.fastq.gz,/mnt/titan/mcasanova/ctDNA/01_samples/CHUgeneve/F2407720_R2_combined.fastq.gz
F2407721,1,monolayer,L1,/mnt/titan/mcasanova/ctDNA/01_samples/CHUgeneve/F2407721_R1_combined.fastq.gz,/mnt/titan/mcasanova/ctDNA/01_samples/CHUgeneve/F2407721_R2_combined.fastq.gz
F2407722,1,monolayer,L1,/mnt/titan/mcasanova/ctDNA/01_samples/CHUgeneve/F2407722_R1_combined.fastq.gz,/mnt/titan/mcasanova/ctDNA/01_samples/CHUgeneve/F2407722_R2_combined.fastq.gz
F2407723,1,monolayer,L1,/mnt/titan/mcasanova/ctDNA/01_samples/CHUgeneve/F2407723_R1_combined.fastq.gz,/mnt/titan/mcasanova/ctDNA/01_samples/CHUgeneve/F2407723_R2_combined.fastq.gz

Parameters file used:

{
    "input": "/mnt/titan/mcasanova/ctDNA/01_samples/CHUgeneve/CHUgeneve_samplelist.csv",
    "wes": true,
    "tools": "mutect2,vep,snpeff",
    "aligner": "bwa-mem2",
    "vep_custom_args": "everything",
    "save_mapped": true,
    "save_output_as_bam": true,
    "genome": "GATK.GRCh38",
    "save_reference": true
}

Command used and terminal output

$ nextflow run nf-core/sarek -profile docker --outdir ./results -resume -params-file nf-param.json -with-report -with-trace -with-dag flowchart.png -with-timeline timeline.html

nextflow.exception.AbortOperationException: Detected join operation duplicate emission on left channel -- offending element: key=[patient:F2407719, sample:monolayer, sex:NA, status:0, n_fastq:12, data_type:bam, id:monolayer]; value=/mnt/titan/mcasanova/ctDNA/03_analysis_CHU/work/7f/5a1152a34ee73b2a2262a119788472/monolayer.sorted.bam
	at nextflow.extension.JoinOp.checkForDuplicate(JoinOp.groovy:266)
	at nextflow.extension.JoinOp.join0(JoinOp.groovy:177)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:569)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:343)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:328)
	at groovy.lang.MetaClassImpl.doInvokeMethod(MetaClassImpl.java:1333)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1088)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1007)
	at org.codehaus.groovy.runtime.InvokerHelper.invokePogoMethod(InvokerHelper.java:645)
	at org.codehaus.groovy.runtime.InvokerHelper.invokeMethod(InvokerHelper.java:628)
	at org.codehaus.groovy.runtime.InvokerHelper.invokeMethodSafe(InvokerHelper.java:82)
	at nextflow.extension.JoinOp$_handler_closure1.doCall(JoinOp.groovy:117)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:569)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:343)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:328)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:279)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1007)
	at org.codehaus.groovy.vmplugin.v8.IndyInterface.fromCache(IndyInterface.java:321)
	at nextflow.extension.DataflowHelper$_subscribeImpl_closure2.doCall(DataflowHelper.groovy:287)
	at jdk.internal.reflect.GeneratedMethodAccessor134.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:569)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:343)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:328)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:279)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1007)
	at groovy.lang.Closure.call(Closure.java:433)
	at groovyx.gpars.dataflow.operator.DataflowOperatorActor.startTask(DataflowOperatorActor.java:120)
	at groovyx.gpars.dataflow.operator.DataflowOperatorActor.onMessage(DataflowOperatorActor.java:108)
	at groovyx.gpars.actor.impl.SDAClosure$1.call(SDAClosure.java:43)
	at groovyx.gpars.actor.AbstractLoopingActor.runEnhancedWithoutRepliesOnMessages(AbstractLoopingActor.java:293)
	at groovyx.gpars.actor.AbstractLoopingActor.access$400(AbstractLoopingActor.java:30)
	at groovyx.gpars.actor.AbstractLoopingActor$1.handleMessage(AbstractLoopingActor.java:93)
	at groovyx.gpars.util.AsyncMessagingCore.run(AsyncMessagingCore.java:132)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:840)

Relevant files

No response

System information

  • Nextflow version 24.04.4
  • Workstation
  • Local
  • Docker
  • Ubuntu Linux
  • nf-core/sarek v3.4.4-g5cc3049
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions