Skip to content

Feature Request: include the line number in the error message when STAR deems the line to be malformed #1322

@cmatKhan

Description

@cmatKhan

This is the command I am using:

  STAR \
      --genomeDir star \
      --readFilesIn 557_trimmed.fq.gz  \
      --runThreadN 16 \
      --outFileNamePrefix 557. \
       \
      --sjdbGTFfile KN99_genome_fungidb_no_markers_NAT_G418.gtf \
      --outSAMattrRGline ID:557 'SM:557' \
      --quantMode TranscriptomeSAM --twopassMode Basic --outSAMtype BAM Unsorted --readFilesCommand zcat --runRNGseed 0 --outFilterMultimapNmax 20 --alignSJDBoverhangMin 1 --outSAMattributes NH HI AS NM MD --quantTranscriptomeBan Singleend

And this is the error:

  ReadAlignChunk_processChunks.cpp:161:processChunks EXITING because of FATAL ERROR in input reads: unknown file format: the read ID should start with @ or > 
  
  Aug 08 20:14:56 ...... FATAL ERROR, exiting
  

I do see the discussion of this in issue #379. I am using SLURM, not SGE, and the --readFilesCommand zcat is there.

I did examine the fastq file. I used an old software called fastqValidator (anybody have another they like? i haven't had a need for this before) as well a nice awk one liner suggested in issue #1101. In both cases, my fastq file came back clean.

I then looked for @ and > symbols outside of ID lines -- none -- and for @ and > symbols in the ID lines other than the first character. Again, none.

The head and tail look good. I also checked to make sure the file is actually gzipped, as opposed to just having the .gz suffix. It is.

I have run novoalign on this file without issue.

I'd like to suggest that if/when there is a fastq format error like this, the offending line be included in the error message -- is that easily done by chance? It would make hunting down the problem a lot easier on the user side.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions