Skip to content

STARsolo CB_samTagOut mode error in combo with STAR diploid #2058

@mparker2

Description

@mparker2

Hi @alexdobin ,

Thanks again for maintaining STAR. I am trying to use STAR to map single cell ATAC reads against a genome generated using the genome parameter --genomeTransformType Diploid using the mapping parameter --soloType CB_samTagOut. However this results in the error:

EXITING because of FATAL GENOME INDEX FILE error: transcriptInfo.tab is corrupt, or is incompatible with the current STAR version
SOLUTION: re-generate genome index
Feb 09 11:43:59 ...... FATAL ERROR, exiting

I used the following parameters to generate the genome:

        STAR \
          --runThreadN 16 \
          --runMode "genomeGenerate" \
          --genomeDir "{output}" \
          --genomeFastaFiles "{input.fasta_fn}" \
          --sjdbGTFfile "{input.gtf_fn}" \
          --genomeTransformVCF "{input.vcf_fn}" \
          --genomeTransformType "Diploid" \
          --genomeSAindexNbases 12 \
          --sjdbOverhang 149

Which runs successfully:

	STAR version: 2.7.11a   compiled: 2023-09-15T02:58:53+0000 :/opt/conda/conda-bld/star_1694746407721/work/source
Feb 09 12:05:45 ..... started STAR run
Feb 09 12:05:45 ... starting to generate Genome files
Feb 09 12:05:46 ..... processing annotations GTF
Feb 09 12:05:53 ... starting to sort Suffix Array. This may take a long time...
Feb 09 12:05:54 ... sorting Suffix Array chunks and saving them to disk...
Feb 09 12:07:59 ... loading chunks from disk, packing SA...
Feb 09 12:08:03 ... finished generating suffix array
Feb 09 12:08:03 ... generating Suffix Array index
Feb 09 12:08:16 ... completed Suffix Array index
Feb 09 12:08:16 ..... inserting junctions into the genome indices
Feb 09 12:10:11 ... writing Genome to disk ...
Feb 09 12:10:14 ... writing Suffix Array to disk ...
Feb 09 12:10:37 ... writing SAindex to disk
Feb 09 12:10:38 ..... finished successfully
Feb 09 12:10:38 ... starting to generate Genome files
Feb 09 12:10:40 ..... processing annotations GTF
Feb 09 12:10:42 ... starting to sort Suffix Array. This may take a long time...
Feb 09 12:10:42 ... sorting Suffix Array chunks and saving them to disk...
Feb 09 12:11:12 ... loading chunks from disk, packing SA...
Feb 09 12:11:14 ... finished generating suffix array
Feb 09 12:11:14 ... generating Suffix Array index
Feb 09 12:11:23 ... completed Suffix Array index
Feb 09 12:11:23 ..... inserting junctions into the genome indices
Feb 09 12:12:13 ... writing Genome to disk ...
Feb 09 12:12:14 ... writing Suffix Array to disk ...
Feb 09 12:12:34 ... writing SAindex to disk
Feb 09 12:12:36 ..... finished successfully

followed by this command to map the reads:

        STAR \
          --runThreadN 16 \
          --genomeDir "{input.index}" \
          --readFilesIn "{input.read}" "{input.mate}" "{input.barcode}" \
          --readFilesCommand "zcat" \
          --soloType "CB_samTagOut" \
          --soloCBmatchWLtype "1MM" \
          --soloCBwhitelist "{input.barcode_whitelist}" \
          --soloBarcodeReadLength 0 \
          --outFilterMultimapNmax 1 \
          --outFilterMismatchNmax 4 \
          --alignIntronMax 1 \
          --alignMatesGapMax 1000 \
          --outSAMtype BAM Unsorted \
          --outSAMattributes NH HI AS nM CB sS sQ ha \
          --genomeTransformOutput SAM

Which produces the error.

	STAR version: 2.7.11a   compiled: 2023-09-15T02:58:53+0000 :/opt/conda/conda-bld/star_1694746407721/work/source
Feb 09 12:12:47 ..... started STAR run
Feb 09 12:12:47 ..... loading genome

EXITING because of FATAL GENOME INDEX FILE error: transcriptInfo.tab is corrupt, or is incompatible with the current STAR version
SOLUTION: re-generate genome index
Feb 09 12:13:16 ...... FATAL ERROR, exiting

I am confident the genome index is not corrupt, because I can use the same index to map single cell RNA reads using the command:

        STAR \
          --runThreadN 16 \
          --genomeDir "{input.index}" \
          --readFilesIn "{input.mate}" "{input.read_barcode}" \
          --readFilesCommand "zcat" \
          --soloType "CB_UMI_Simple" \
          --soloCBwhitelist "{input.barcode_whitelist}" \
          --soloUMIlen 12 \
          --soloUMIdedup "1MM_Directional_UMItools" \
          --outFilterMultimapNmax 2 \
          --outFilterIntronMotifs "RemoveNoncanonical" \
          --alignSJoverhangMin 12 \
          --alignSJDBoverhangMin 4 \
          --outFilterMismatchNmax 2 \
          --alignIntronMin 60 \
          --alignIntronMax 20000 \
          --outSAMtype "BAM" "SortedByCoordinate" \
          --outBAMsortingBinsN 150 \
          --limitBAMsortRAM {params.sort_mem} \
          --outSAMattributes NH HI AS nM CB UB ha \
          --genomeTransformOutput SAM SJ Quant

This command seems to have no problem loading the genome and begins mapping normally (although I killed the process once the genome appeared to have been loaded successfully).

	STAR version: 2.7.11a   compiled: 2023-09-15T02:58:53+0000 :/opt/conda/conda-bld/star_1694746407721/work/source
Feb 09 12:54:08 ..... started STAR run
Feb 09 12:54:09 ..... loading genome
Feb 09 12:54:10 ..... started mapping
^C

Many thanks
Matt

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions