-
Notifications
You must be signed in to change notification settings - Fork 544
Description
This is the command I am using:
STAR \
--genomeDir star \
--readFilesIn 557_trimmed.fq.gz \
--runThreadN 16 \
--outFileNamePrefix 557. \
\
--sjdbGTFfile KN99_genome_fungidb_no_markers_NAT_G418.gtf \
--outSAMattrRGline ID:557 'SM:557' \
--quantMode TranscriptomeSAM --twopassMode Basic --outSAMtype BAM Unsorted --readFilesCommand zcat --runRNGseed 0 --outFilterMultimapNmax 20 --alignSJDBoverhangMin 1 --outSAMattributes NH HI AS NM MD --quantTranscriptomeBan Singleend
And this is the error:
ReadAlignChunk_processChunks.cpp:161:processChunks EXITING because of FATAL ERROR in input reads: unknown file format: the read ID should start with @ or >
Aug 08 20:14:56 ...... FATAL ERROR, exiting
I do see the discussion of this in issue #379. I am using SLURM, not SGE, and the --readFilesCommand zcat is there.
I did examine the fastq file. I used an old software called fastqValidator (anybody have another they like? i haven't had a need for this before) as well a nice awk one liner suggested in issue #1101. In both cases, my fastq file came back clean.
I then looked for @ and > symbols outside of ID lines -- none -- and for @ and > symbols in the ID lines other than the first character. Again, none.
The head and tail look good. I also checked to make sure the file is actually gzipped, as opposed to just having the .gz suffix. It is.
I have run novoalign on this file without issue.
I'd like to suggest that if/when there is a fastq format error like this, the offending line be included in the error message -- is that easily done by chance? It would make hunting down the problem a lot easier on the user side.