Skip to content

Nextflow doesn't capture Fusion exit codes from Google Batch API #6481

@ejseqera

Description

@ejseqera

Bug report

When running Google Batch jobs with Fusion enabled, tasks that fail with fusion exit code 174/175 are not properly reported by Nextflow. Instead of showing the actual exit code (175), Nextflow displays the status as - or unknown.

The GoogleBatchTaskHandler only reads the exit code from the .exitcode file and does not use the exit code provided by the Google Batch API response, unlike the AWS Batch and Azure Batch implementations.

task.exitStatus = readExitFile()

When Fusion fails with exit code 175, the .exitcode file may not be written, causing readExitFile() to return Integer.MAX_VALUE, which displays as - in the NF log. The Google Batch API does provide the exit code via lastEvent?.taskExecution?.exitCode but it's currently only logged and not used to set the task exit status.

Expected behavior and actual behavior

Expected behaviour:

When a task fails with exit code 175 (or 174, or any other exit code returned by the batch API), Nextflow should:

  1. Capture and display the actual exit code from the Google Batch API
  2. Report the task as failed with the correct exit status
  3. Behave consistently with other AWS Batch and Azure Batch executors
Actual behaviour:
  1. Task exits with Fusion code 175 in Google Batch
  2. Nextflow reports the exit status as - (which represents Integer.MAX_VALUE internally)
  3. The actual exit code is logged but not used

Steps to reproduce the problem

  1. Run a workflow with exit 175 in the process script on Google Batch
  2. Batch job will return exit code and fail
  3. Nextflow will fail to return this exit code

Program output

Oct-13 13:05:07.903 [Task monitor] DEBUG n.c.g.batch.GoogleBatchTaskHandler - [GOOGLE BATCH] Process `NFCORE_SAREK:SAREK:BAM_MARKDUPLICATES:GATK4_MARKDUPLICATES (NA12878)` - last event: description: "Job state is set from RUNNING to FAILED for job projects/687213979415/locations/us-central1/jobs/nf-030f7468-1760359922905.Job failed due to task failure. Specifically, task with index 0 failed due to the following task event: \"Task state is updated from RUNNING to FAILED on zones/us-central1-a/instances/257915267378660956 with exit code 175.\""
event_time {
  seconds: 1760360691
  nanos: 186135096
}
type: "STATUS_CHANGED"
; exit code: 0
Oct-13 13:05:08.892 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 92; name: NFCORE_SAREK:SAREK:BAM_MARKDUPLICATES:GATK4_MARKDUPLICATES (NA12878); status: COMPLETED; exit: -; error: -; workDir: gs://scidev-testing-central/scratch/175EjyIui2YYrV/03/0f7468c945d7dc3ace3b04a39d16d9]
Oct-13 13:05:08.969 [TaskFinalizer-10] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
  task: name=NFCORE_SAREK:SAREK:BAM_MARKDUPLICATES:GATK4_MARKDUPLICATES (NA12878); work-dir=gs://scidev-testing-central/scratch/175EjyIui2YYrV/03/0f7468c945d7dc3ace3b04a39d16d9
  error [nextflow.exception.ProcessFailedException]: Process `NFCORE_SAREK:SAREK:BAM_MARKDUPLICATES:GATK4_MARKDUPLICATES (NA12878)` terminated for an unknown reason -- Likely it has been terminated by the external system
Oct-13 13:05:08.996 [TaskFinalizer-10] ERROR nextflow.processor.TaskProcessor - Error executing process > 'NFCORE_SAREK:SAREK:BAM_MARKDUPLICATES:GATK4_MARKDUPLICATES (NA12878)'

Caused by:
  Process `NFCORE_SAREK:SAREK:BAM_MARKDUPLICATES:GATK4_MARKDUPLICATES (NA12878)` terminated for an unknown reason -- Likely it has been terminated by the external system


Command executed:

  gatk --java-options "-Xmx24576M -XX:-UsePerfData" \
      MarkDuplicates \
      --INPUT NA12878.0005.bam --INPUT NA12878.0001.bam --INPUT NA12878.0003.bam --INPUT NA12878.0008.bam --INPUT NA12878.0007.bam --INPUT NA12878.0006.bam --INPUT NA12878.0009.bam --INPUT NA12878.0004.bam --INPUT NA12878.0010.bam --INPUT NA12878.0011.bam --INPUT NA12878.0002.bam --INPUT NA12878.0012.bam \
      --OUTPUT NA12878.md.bam \
      --METRICS_FILE NA12878.md.cram.metrics \
      --TMP_DIR . \
      --REFERENCE_SEQUENCE Homo_sapiens_assembly38.fasta \
      -REMOVE_DUPLICATES false -VALIDATION_STRINGENCY LENIENT
  
  # If cram files are wished as output, the run samtools for conversion
  if [[ NA12878.md.cram == *.cram ]]; then
      samtools view -Ch -T Homo_sapiens_assembly38.fasta -o NA12878.md.cram NA12878.md.bam
      rm NA12878.md.bam
      samtools index NA12878.md.cram
  fi
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_SAREK:SAREK:BAM_MARKDUPLICATES:GATK4_MARKDUPLICATES":
      gatk4: $(echo $(gatk --version 2>&1) | sed 's/^.*(GATK) v//; s/ .*$//')
      samtools: $(echo $(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*$//')
  END_VERSIONS

Command exit status:
  -

Environment

  • Nextflow version: 25.09.1-edge
  • Executor: Google Batch

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions