Skip to content

Error when FASTQ read headers are too long #245

@AntonS-bio

Description

@AntonS-bio

Not the NanoSim bug as such, but very relevant for Nanopore data. Samtools limits the read header to 254 characters (https://github.com/samtools/samtools/issues/10810). NanoSim (v3.2.2) doesn't seem to check that call to minimap2 in "read_analysis.py" complete without error. So when the input read file has read headers with >254 characters, samtools fails silently and the NanoSim continues running and throws error:

./output/test_genome_alnm.bam
Traceback (most recent call last):
  File "/home/lshas17/miniforge3/envs/nanosimENV/bin/read_analysis.py", line 896, in <module>
    main()
  File "/home/lshas17/miniforge3/envs/nanosimENV/bin/read_analysis.py", line 606, in main
    alnm_ext, unaligned_length, strandness, unaligned_base_qualities = align_genome(in_fasta, prefix, aligner,
  File "/home/lshas17/miniforge3/envs/nanosimENV/bin/read_analysis.py", line 199, in align_genome
    unaligned_length, strandness, unaligned_base_quals = get_primary_sam.primary_and_unaligned(g_alnm, prefix, quantification, fastq=fastq)
  File "/home/lshas17/miniforge3/envs/nanosimENV/bin/get_primary_sam.py", line 188, in primary_and_unaligned
    strandness = float(pos_strand) / num_aligned
ZeroDivisionError: float division by zero

Adding check of return code to calls to minimap2 (for example on line 171 in "read_analysis.py") will help users fix the problem with their data.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions