-
Couldn't load subscription status.
- Fork 88
Description
Hi Jon,
I'm using version 1.8.8 downloaded May 17 using pip, inside a conda environment.
Background:
About a month ago I ran funannotate train using RNAseq, and then funannotate predict to predict genes in one strain of a fungal species.
I then saved the species parameters ('funannotate species -s 'fp2516_v2' -a <parameter_file.json>') for use in predicting genes in other strains of the same species.
Last week, I tried funannotate predict (using v1.8.7) for a different strain of the same species, using the saved species parameters. The pipeline seemed to work, although it displayed two failures, here is some of the output from that (I bolded and italicized the failures):
/home/tommy/miniconda3/envs/funannotate/bin/funannotate predict -i Fp157.masked.fasta -o Fp157_fp2516 -s fp2516_v2 --repeat_filter blast --cpus 24
[05/16/21 17:26:33]: OS: Ubuntu 20.04, 24 cores, ~ 74 GB RAM. Python: 3.7.9
[05/16/21 17:26:33]: Running funannotate v1.8.7
[05/16/21 17:26:33]: GeneMark path: /home/tommy/funannotate/genemark/gmes_petap/
[05/16/21 17:26:33]: Full path to gmes_petap.pl: /home/tommy/funannotate/genemark/gmes_petap/gmes_petap.pl
[05/16/21 17:26:33]: GeneMark appears to be functional? True
[05/16/21 17:26:33]: {'augustus': 1, 'hiq': 2, 'genemark': 1, 'pasa': 6, 'codingquarry': 0, 'snap': 1, 'glimmerhmm': 1, 'proteins': 1, 'transcripts': 1}
[05/16/21 17:26:33]: Skipping CodingQuarry as no --rna_bam passed
[05/16/21 17:26:33]: {'augustus': 'pretrained', 'genemark': 'pretrained', 'snap': 'pretrained', 'glimmerhmm': 'pretrained'}
[05/16/21 17:26:33]: Parsed training data, run ab-initio gene predictors as follows:
[05/16/21 17:26:34]: {'augustus': 1, 'hiq': 2, 'genemark': 1, 'pasa': 6, 'codingquarry': 0, 'snap': 1, 'glimmerhmm': 1, 'proteins': 1, 'transcripts': 1}
[05/16/21 17:26:45]: Loading genome assembly and parsing soft-masked repetitive sequences
[05/16/21 17:26:48]: Genome loaded: 13 scaffolds; 44,000,513 bp; 14.62% repeats masked
[05/16/21 18:25:08]: join_mult_hints.pl
[05/16/21 18:25:09]: Running GeneMark-ES on assembly
[05/16/21 18:25:09]: /home/tommy/funannotate/genemark/gmes_petap/gmes_petap.pl --ES --max_intron 3000 --soft_mask 2000 --cores 24 --sequence /home/tommy/funannotate/Fp157/Fp157_fp2516/predict_misc/genome.softmasked.fa --fungus --ini_mod /home/tommy/funannotate/Fp157/Fp157_fp2516/predict_misc/ab_initio_parameters/fp2516_v2.genemark.mod
[05/16/21 18:44:05]: (None, b'')
[05/16/21 18:44:06]: perl /home/tommy/miniconda3/envs/funannotate/opt/evidencemodeler-1.1.1/EvmUtils/misc/augustus_GFF3_to_EVM_GFF3.pl Fp157_fp2516/predict_misc/genemark.gff
[05/16/21 18:44:08]: 13,571 predictions from GeneMark
[05/16/21 18:44:09]: Running Augustus gene prediction using fp2516_v2 parameters
[05/16/21 18:47:47]: perl /home/tommy/miniconda3/envs/funannotate/opt/evidencemodeler-1.1.1/EvmUtils/misc/augustus_GFF3_to_EVM_GFF3.pl Fp157_fp2516/predict_misc/augustus.gff3
[05/16/21 18:47:49]: Pulling out high quality Augustus predictions
[05/16/21 18:47:50]: Found 164 high quality predictions from Augustus (>90% exon evidence)
[05/16/21 18:47:50]: Running SNAP gene prediction, using pre-trained HMM profile
[05/16/21 18:47:50]: snap /home/tommy/funannotate/Fp157/Fp157_fp2516/predict_misc/ab_initio_parameters/fp2516_v2.snap.hmm /home/tommy/funannotate/Fp157/Fp157_fp2516/predict_misc/genome.softmasked.fa
[05/16/21 18:51:22]: 13,673 predictions from SNAP
[05/16/21 18:51:22]: snap failed removing from training parameters
[05/16/21 18:51:22]: Running GlimmerHMM gene prediction, using pretrained HMM profile
...etc (let me know if you want the whole logfile, the run finished more or less successfully but as an aside, the genome is lacking about 1,000+ genes, I'm not sure why, nor why there are so few high quality augustus predictions (??)).
Current Issue:
I saw that issue #591 output had same failure errors from SNAP and GlimmerHMM, so I figured I'd better update to the latest release v1.8.8, and now when I run the same command as above I get the following error after genemark-ES is finished:
[May 17 10:59 AM]: RunBusco is set to False and args.pasa_gff is None --> cannot generate training set. If you are reading this it is a bug, please report.
Here's the full command and output (note, I added the trinity fasta as transcript_evidence just to see if that would change anything - also I'm running with nohup due to spotty connection).
nohup funannotate predict -i Fp157.masked.fasta -o Fp157_fp2516_no_Trin -s fp2516_v2 --repeat_filter blast --cpus 24 &> Fp157_fp2516_no_trinity.out.log &
[May 17 12:07 PM]: OS: Ubuntu 20.04, 24 cores, ~ 74 GB RAM. Python: 3.7.9
[May 17 12:07 PM]: Running funannotate v1.8.8
[May 17 12:07 PM]: Skipping CodingQuarry as no --rna_bam passed
[May 17 12:07 PM]: Parsed training data, run ab-initio gene predictors as follows:
Program Training-Method
augustus pretrained
genemark pretrained
glimmerhmm pretrained
snap pretrained
[May 17 12:07 PM]: Loading genome assembly and parsing soft-masked repetitive sequences
[May 17 12:07 PM]: Genome loaded: 13 scaffolds; 44,000,513 bp; 14.62% repeats masked
[May 17 12:07 PM]: Mapping 551,348 proteins to genome using diamond and exonerate
[May 17 12:12 PM]: Found 295,312 preliminary alignments --> aligning with exonerate
[May 17 01:06 PM]: Exonerate finished: found 1,793 alignments
[May 17 01:06 PM]: Running GeneMark-ES on assembly
[May 17 01:24 PM]: 13,551 predictions from GeneMark
[May 17 01:24 PM]: RunBusco is set to False and args.pasa_gff is None --> cannot generate training set. If you are reading this it is a bug, please report.