-
Notifications
You must be signed in to change notification settings - Fork 544
Open
Description
Hi @alexdobin ,
Thanks again for maintaining STAR. I am trying to use STAR to map single cell ATAC reads against a genome generated using the genome parameter --genomeTransformType Diploid using the mapping parameter --soloType CB_samTagOut. However this results in the error:
EXITING because of FATAL GENOME INDEX FILE error: transcriptInfo.tab is corrupt, or is incompatible with the current STAR version
SOLUTION: re-generate genome index
Feb 09 11:43:59 ...... FATAL ERROR, exiting
I used the following parameters to generate the genome:
STAR \
--runThreadN 16 \
--runMode "genomeGenerate" \
--genomeDir "{output}" \
--genomeFastaFiles "{input.fasta_fn}" \
--sjdbGTFfile "{input.gtf_fn}" \
--genomeTransformVCF "{input.vcf_fn}" \
--genomeTransformType "Diploid" \
--genomeSAindexNbases 12 \
--sjdbOverhang 149
Which runs successfully:
STAR version: 2.7.11a compiled: 2023-09-15T02:58:53+0000 :/opt/conda/conda-bld/star_1694746407721/work/source
Feb 09 12:05:45 ..... started STAR run
Feb 09 12:05:45 ... starting to generate Genome files
Feb 09 12:05:46 ..... processing annotations GTF
Feb 09 12:05:53 ... starting to sort Suffix Array. This may take a long time...
Feb 09 12:05:54 ... sorting Suffix Array chunks and saving them to disk...
Feb 09 12:07:59 ... loading chunks from disk, packing SA...
Feb 09 12:08:03 ... finished generating suffix array
Feb 09 12:08:03 ... generating Suffix Array index
Feb 09 12:08:16 ... completed Suffix Array index
Feb 09 12:08:16 ..... inserting junctions into the genome indices
Feb 09 12:10:11 ... writing Genome to disk ...
Feb 09 12:10:14 ... writing Suffix Array to disk ...
Feb 09 12:10:37 ... writing SAindex to disk
Feb 09 12:10:38 ..... finished successfully
Feb 09 12:10:38 ... starting to generate Genome files
Feb 09 12:10:40 ..... processing annotations GTF
Feb 09 12:10:42 ... starting to sort Suffix Array. This may take a long time...
Feb 09 12:10:42 ... sorting Suffix Array chunks and saving them to disk...
Feb 09 12:11:12 ... loading chunks from disk, packing SA...
Feb 09 12:11:14 ... finished generating suffix array
Feb 09 12:11:14 ... generating Suffix Array index
Feb 09 12:11:23 ... completed Suffix Array index
Feb 09 12:11:23 ..... inserting junctions into the genome indices
Feb 09 12:12:13 ... writing Genome to disk ...
Feb 09 12:12:14 ... writing Suffix Array to disk ...
Feb 09 12:12:34 ... writing SAindex to disk
Feb 09 12:12:36 ..... finished successfully
followed by this command to map the reads:
STAR \
--runThreadN 16 \
--genomeDir "{input.index}" \
--readFilesIn "{input.read}" "{input.mate}" "{input.barcode}" \
--readFilesCommand "zcat" \
--soloType "CB_samTagOut" \
--soloCBmatchWLtype "1MM" \
--soloCBwhitelist "{input.barcode_whitelist}" \
--soloBarcodeReadLength 0 \
--outFilterMultimapNmax 1 \
--outFilterMismatchNmax 4 \
--alignIntronMax 1 \
--alignMatesGapMax 1000 \
--outSAMtype BAM Unsorted \
--outSAMattributes NH HI AS nM CB sS sQ ha \
--genomeTransformOutput SAM
Which produces the error.
STAR version: 2.7.11a compiled: 2023-09-15T02:58:53+0000 :/opt/conda/conda-bld/star_1694746407721/work/source
Feb 09 12:12:47 ..... started STAR run
Feb 09 12:12:47 ..... loading genome
EXITING because of FATAL GENOME INDEX FILE error: transcriptInfo.tab is corrupt, or is incompatible with the current STAR version
SOLUTION: re-generate genome index
Feb 09 12:13:16 ...... FATAL ERROR, exiting
I am confident the genome index is not corrupt, because I can use the same index to map single cell RNA reads using the command:
STAR \
--runThreadN 16 \
--genomeDir "{input.index}" \
--readFilesIn "{input.mate}" "{input.read_barcode}" \
--readFilesCommand "zcat" \
--soloType "CB_UMI_Simple" \
--soloCBwhitelist "{input.barcode_whitelist}" \
--soloUMIlen 12 \
--soloUMIdedup "1MM_Directional_UMItools" \
--outFilterMultimapNmax 2 \
--outFilterIntronMotifs "RemoveNoncanonical" \
--alignSJoverhangMin 12 \
--alignSJDBoverhangMin 4 \
--outFilterMismatchNmax 2 \
--alignIntronMin 60 \
--alignIntronMax 20000 \
--outSAMtype "BAM" "SortedByCoordinate" \
--outBAMsortingBinsN 150 \
--limitBAMsortRAM {params.sort_mem} \
--outSAMattributes NH HI AS nM CB UB ha \
--genomeTransformOutput SAM SJ Quant
This command seems to have no problem loading the genome and begins mapping normally (although I killed the process once the genome appeared to have been loaded successfully).
STAR version: 2.7.11a compiled: 2023-09-15T02:58:53+0000 :/opt/conda/conda-bld/star_1694746407721/work/source
Feb 09 12:54:08 ..... started STAR run
Feb 09 12:54:09 ..... loading genome
Feb 09 12:54:10 ..... started mapping
^C
Many thanks
Matt