Skip to content

Chimeric.out.Junction file is the command used instead of actual junctions #1183

@jgoldst7

Description

@jgoldst7

Hello,
I am running into an issue trying to produce the Chimeric.out.Junction file for large files. I have a pipeline using STAR 2.7.7a with the command

STAR --genomeDir `pwd`/{params.index} \
		--runThreadN 12 \
		--readFilesIn {input.R1} {input.R2} \
		--outFileNamePrefix {params.outPrefix} \
		--outSAMtype BAM Unsorted \
		--readFilesCommand zcat \
		--quantMode TranscriptomeSAM \
		--outSAMunmapped Within \
		--chimSegmentMin 12 \
		--chimOutJunctionFormat 1 \
		--alignSJstitchMismatchNmax 5 -1 5 5 \
		--alignMatesGapMax 100000 \
		--twopassMode Basic &> {log}

This has worked in the past, with the Chimeric.out.Junction file having valid chimeric reads that were used with STAR Fusion 1.9.0 to find fusions. However, when I try to run the exact commands with very large fastq files (~10Gb for each of gzipped 2 paired end files instead of ~2 GB like I usually use, I am having issues. STAR completes fine. This is the log file

                                 Started job on |	Mar 20 15:52:11
                             Started mapping on |	Mar 20 17:38:10
                                    Finished on |	Mar 20 21:50:04
       Mapping speed, Million of reads per hour |	38.49

                          Number of input reads |	161591557
                      Average input read length |	279
                                    UNIQUE READS:
                   Uniquely mapped reads number |	153788119
                        Uniquely mapped reads % |	95.17%
                          Average mapped length |	278.95
                       Number of splices: Total |	172417208
            Number of splices: Annotated (sjdb) |	172128626
                       Number of splices: GT/AG |	170456671
                       Number of splices: GC/AG |	1466742
                       Number of splices: AT/AC |	153618
               Number of splices: Non-canonical |	340177
                      Mismatch rate per base, % |	0.24%
                         Deletion rate per base |	0.01%
                        Deletion average length |	1.93
                        Insertion rate per base |	0.01%
                       Insertion average length |	1.66
                             MULTI-MAPPING READS:
        Number of reads mapped to multiple loci |	4775002
             % of reads mapped to multiple loci |	2.95%
        Number of reads mapped to too many loci |	61939
             % of reads mapped to too many loci |	0.04%
                                  UNMAPPED READS:
  Number of reads unmapped: too many mismatches |	0
       % of reads unmapped: too many mismatches |	0.00%
            Number of reads unmapped: too short |	2820621
                 % of reads unmapped: too short |	1.75%
                Number of reads unmapped: other |	145876
                     % of reads unmapped: other |	0.09%
                                  CHIMERIC READS:
                       Number of chimeric reads |	902812
                            % of chimeric reads |	0.56%

There is a Chimeric.out.Junction file produced. However, what it actually contains is this instead of chimeric reads.

# 2.7.7a   STAR --genomeDir /data1/snakemake/rnaseq.raw/indexes/star_rsem --runThreadN 12 --readFilesIn rnaseq.raw/30-446545408/results/trimming/UM-HACC-2A-1_R1_val_1.fq.gz rnaseq.raw/30-446545408/results/trimming/UM-HACC-2A-1_R2_val_2.fq.gz --outFileNamePrefix rnaseq.raw/30-446545408/results/alignment/UM-HACC-2A-1/ --outSAMtype BAM Unsorted --readFilesCommand zcat --quantMode TranscriptomeSAM --outSAMunmapped Within --chimSegmentMin 12 --chimOutJunctionFormat 1 --alignSJstitchMismatchNmax 5 -1 5 5 --alignMatesGapMax 100000 --twopassMode Basic
# Nreads 161591557	NreadsUnique 153788119	NreadsMulti 4775002

Any ideas why this might be happening, possibly related to the fastq files being large? When I took a subsample of these files of just 100,000 reads (using seqtk) everything worked fine, and the Chimeric.out.Junction file was normal.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions