Skip to content

vg giraffe hangs or stalls when reading large gzipped FASTQ files #4645

@linsson

Description

@linsson

1. What were you trying to do?

I was trying to map paired-end reads from .fastp.fastq.gz files to a tomato pangenome graph using vg giraffe.
Command structure (called inside a Bash job submitted via bsub):

vg giraffe -t 48 -Z graph.gbz -d graph.dist -m graph.min \
  -f sample_R1.fastp.fastq.gz -f sample_R2.fastp.fastq.gz \
  -N SAMPLE -R SAMPLE

2. What did you want to happen?

I expected vg giraffe to process the reads and output a .gam file with the alignments.

3. What actually happened?

In many jobs, the process stalls indefinitely. Symptoms include:

  • .gam file remains empty (0 bytes) even after 1+ hour.
  • The vg giraffe process appears in ps output with status Dl, indicating uninterruptible I/O wait.
  • In some cases, vg giraffe hangs immediately after start; in others, it starts writing but then stops progressing.
  • Manual inspection with lsof confirms that both FASTQ files are open.

Files are on a GPFS (IBM Spectrum Scale) filesystem, but manual read with zcat is instantaneous.

4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt, please copy-paste the contents of that file here:

No crash or stack trace is produced, only hanging.

5. What data and command can the vg dev team use to make the problem happen?

While I cannot share the full tomato data due to size/privacy, the problem appears reproducible when:

  • Reading gzipped .fastq.gz files > 5 GB each (paired-end).
  • Files are stored on a GPFS filesystem.
  • The command is run with high thread count (e.g. -t 48).
  • The command is launched inside a batch job (e.g. via bsub).

6. What does running vg version say?

v> vg version
vg [warning]: System's vm.overcommit_memory setting is 2 (never overcommit). vg does not work well under these conditions; you may appear to run out of memory with plenty of memory left. Attempting to unsafely reconfigure jemalloc to deal better with this situation.
vg version v1.66.0 "Navetta"
Compiled with g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 on Linux
Linked against libstd++ 20230528
Using HTSlib headers 101990, library 1.19.1-29-g3cfe8769
Built by fokamoto@mustard

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions