-
Notifications
You must be signed in to change notification settings - Fork 746
Description
Bug report
When running Google Batch jobs with Fusion enabled, tasks that fail with fusion exit code 174/175 are not properly reported by Nextflow. Instead of showing the actual exit code (175), Nextflow displays the status as - or unknown.
The GoogleBatchTaskHandler only reads the exit code from the .exitcode file and does not use the exit code provided by the Google Batch API response, unlike the AWS Batch and Azure Batch implementations.
task.exitStatus = readExitFile()
When Fusion fails with exit code 175, the .exitcode file may not be written, causing readExitFile() to return Integer.MAX_VALUE, which displays as - in the NF log. The Google Batch API does provide the exit code via lastEvent?.taskExecution?.exitCode but it's currently only logged and not used to set the task exit status.
Expected behavior and actual behavior
Expected behaviour:
When a task fails with exit code 175 (or 174, or any other exit code returned by the batch API), Nextflow should:
- Capture and display the actual exit code from the Google Batch API
- Report the task as failed with the correct exit status
- Behave consistently with other AWS Batch and Azure Batch executors
Actual behaviour:
- Task exits with Fusion code 175 in Google Batch
- Nextflow reports the exit status as
-(which representsInteger.MAX_VALUEinternally) - The actual exit code is logged but not used
Steps to reproduce the problem
- Run a workflow with
exit 175in the process script on Google Batch - Batch job will return exit code and fail
- Nextflow will fail to return this exit code
Program output
Oct-13 13:05:07.903 [Task monitor] DEBUG n.c.g.batch.GoogleBatchTaskHandler - [GOOGLE BATCH] Process `NFCORE_SAREK:SAREK:BAM_MARKDUPLICATES:GATK4_MARKDUPLICATES (NA12878)` - last event: description: "Job state is set from RUNNING to FAILED for job projects/687213979415/locations/us-central1/jobs/nf-030f7468-1760359922905.Job failed due to task failure. Specifically, task with index 0 failed due to the following task event: \"Task state is updated from RUNNING to FAILED on zones/us-central1-a/instances/257915267378660956 with exit code 175.\""
event_time {
seconds: 1760360691
nanos: 186135096
}
type: "STATUS_CHANGED"
; exit code: 0
Oct-13 13:05:08.892 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 92; name: NFCORE_SAREK:SAREK:BAM_MARKDUPLICATES:GATK4_MARKDUPLICATES (NA12878); status: COMPLETED; exit: -; error: -; workDir: gs://scidev-testing-central/scratch/175EjyIui2YYrV/03/0f7468c945d7dc3ace3b04a39d16d9]
Oct-13 13:05:08.969 [TaskFinalizer-10] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
task: name=NFCORE_SAREK:SAREK:BAM_MARKDUPLICATES:GATK4_MARKDUPLICATES (NA12878); work-dir=gs://scidev-testing-central/scratch/175EjyIui2YYrV/03/0f7468c945d7dc3ace3b04a39d16d9
error [nextflow.exception.ProcessFailedException]: Process `NFCORE_SAREK:SAREK:BAM_MARKDUPLICATES:GATK4_MARKDUPLICATES (NA12878)` terminated for an unknown reason -- Likely it has been terminated by the external system
Oct-13 13:05:08.996 [TaskFinalizer-10] ERROR nextflow.processor.TaskProcessor - Error executing process > 'NFCORE_SAREK:SAREK:BAM_MARKDUPLICATES:GATK4_MARKDUPLICATES (NA12878)'
Caused by:
Process `NFCORE_SAREK:SAREK:BAM_MARKDUPLICATES:GATK4_MARKDUPLICATES (NA12878)` terminated for an unknown reason -- Likely it has been terminated by the external system
Command executed:
gatk --java-options "-Xmx24576M -XX:-UsePerfData" \
MarkDuplicates \
--INPUT NA12878.0005.bam --INPUT NA12878.0001.bam --INPUT NA12878.0003.bam --INPUT NA12878.0008.bam --INPUT NA12878.0007.bam --INPUT NA12878.0006.bam --INPUT NA12878.0009.bam --INPUT NA12878.0004.bam --INPUT NA12878.0010.bam --INPUT NA12878.0011.bam --INPUT NA12878.0002.bam --INPUT NA12878.0012.bam \
--OUTPUT NA12878.md.bam \
--METRICS_FILE NA12878.md.cram.metrics \
--TMP_DIR . \
--REFERENCE_SEQUENCE Homo_sapiens_assembly38.fasta \
-REMOVE_DUPLICATES false -VALIDATION_STRINGENCY LENIENT
# If cram files are wished as output, the run samtools for conversion
if [[ NA12878.md.cram == *.cram ]]; then
samtools view -Ch -T Homo_sapiens_assembly38.fasta -o NA12878.md.cram NA12878.md.bam
rm NA12878.md.bam
samtools index NA12878.md.cram
fi
cat <<-END_VERSIONS > versions.yml
"NFCORE_SAREK:SAREK:BAM_MARKDUPLICATES:GATK4_MARKDUPLICATES":
gatk4: $(echo $(gatk --version 2>&1) | sed 's/^.*(GATK) v//; s/ .*$//')
samtools: $(echo $(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*$//')
END_VERSIONS
Command exit status:
-
Environment
- Nextflow version: 25.09.1-edge
- Executor: Google Batch