Skip to content

smcluster.proteins.fasta from antismash output is empty. funannotate annotate fails at MIBiG cross-reference stage #1147

@eggrandio

Description

@eggrandio

Hello,

I am running funannotate annotate but it fails at the Cross referencing SM cluster hits with MIBiG database version 1.4. It seems the the annotate_misc/antismash/smcluster.proteins.fasta is empty.

I had to do some tweaking to run antiSMASH. First I changed the version of antiSMASH used by editing the remote.py file, as it was giving an "invalid job type" error when using jobtype: antismash6.

cd /home/eggrandio/miniconda3/envs/funa/lib/python3.11/site-packages/funannotate
cp remote.py remote.py.bak
sed -i 's/antismash6/antismash7/' remote.py

Also, there were soem gene annotations with the same CDS but different UTRs that were giving an error in antiSMASH (see similar issue 931). I removed the conflicting annotations from the .tbl file and generated a separate .gbk file as suggested here using funannotate fix.

Then, I ran antiSMASH using this modified .gbk file:

funannotate remote -g ./funannotate/ppax/antismash/Penicillium_paxilli2_edited_gbk/Penicillium_paxilli2.gbk -o ./funannotate/ppax/antismash/ -m antismash -e <email>

Finally, I pass the results from antiSMASH to funannotate annotate. It recognizes them correctly and is able to parse them, but it fails at the Cross referencing SM cluster hits with MIBiG database. When I check the funannotate/ppax/annotate_misc/antismash/smcluster.proteins.fasta file, it is indeed empty.

I guess it is an error that comes from incorrect parsing of antiSMASH output from version 7. There is an old issue in which they had the same problem with plantiSMASH output (issue 121). But it seems that in Mar 5, 2021 other people were finding the same error.

I am trying to figure out how is this smcluster.proteins.fasta file created, but I cannot find it in annotate.py code.

funannotate annotate --antismash ./funannotate/ppax/antismash/annotate_misc/antiSMASH.results.gbk -i ./funannotate/ppax/ --cpus 28 --sbt ./funannotate/ppax/template.sbt
/home/eggrandio/miniconda3/envs/funa/lib/python3.11/site-packages/funannotate/funannotate.py:11: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  from pkg_resources import get_distribution
-------------------------------------------------------
[Jan 07 02:11 PM]: OS: Ubuntu 22.04, 32 cores, ~ 62 GB RAM. Python: 3.11.14
[Jan 07 02:11 PM]: Running 1.8.17
[Jan 07 02:11 PM]: Found existing output directory ./funannotate/ppax. Warning, will re-use any intermediate files found.
[Jan 07 02:11 PM]: Parsing input files
[Jan 07 02:11 PM]: Existing tbl found: ./funannotate/ppax/update_results/Penicillium_paxilli2.tbl
[Jan 07 02:11 PM]: Adding Functional Annotation to Penicillium paxilli2, NCBI accession: None
[Jan 07 02:11 PM]: Annotation consists of: 13,740 gene models
[Jan 07 02:11 PM]: 14,175 protein records loaded
[Jan 07 02:11 PM]: Running HMMer search of PFAM version 38.0
[Jan 07 02:30 PM]: 17,615 annotations added
[Jan 07 02:30 PM]: Running Diamond blastp search of UniProt DB version 2025_04
[Jan 07 02:30 PM]: 994 valid gene/product annotations from 1,792 total
[Jan 07 02:30 PM]: Running Eggnog-mapper
[Jan 07 03:09 PM]: Parsing EggNog Annotations
[Jan 07 03:09 PM]: EggNog version parsed as 2.1.13
[Jan 07 03:09 PM]: 26,336  COG and EggNog annotations added
[Jan 07 03:09 PM]: Combining UniProt/EggNog gene and product names using Gene2Product version 1.97
[Jan 07 03:09 PM]: 3,110 gene name and product description annotations added
[Jan 07 03:09 PM]: Running Diamond blastp search of MEROPS version 12.5
[Jan 07 03:09 PM]: 404 annotations added
[Jan 07 03:09 PM]: Annotating CAZYmes using HMMer search of dbCAN version 14.0
[Jan 07 03:10 PM]: 532 annotations added
[Jan 07 03:10 PM]: Annotating proteins with BUSCO dikarya models
[Jan 07 03:12 PM]: 1,312 annotations added
[Jan 07 03:12 PM]: Predicting secreted and transmembrane proteins using Phobius
     Progress: 14175 complete, 0 failed, 0 remaining           
[Jan 07 03:16 PM]: Predicting secreted proteins with SignalP
[Jan 07 03:21 PM]: 1,164 secretome and 3,501 transmembane annotations added
[Jan 07 03:21 PM]: Parsing InterProScan5 XML file
[Jan 07 03:23 PM]: Now parsing antiSMASH v7 results, finding SM clusters
[Jan 07 03:23 PM]: Found 54 clusters, 63 biosynthetic enyzmes, and 203 smCOGs predicted by antiSMASH
[Jan 07 03:23 PM]: Found 0 duplicated annotations, adding 133,757 valid annotations
[Jan 07 03:23 PM]: Converting to final Genbank format, good luck!
/home/eggrandio/miniconda3/envs/funa/lib/python3.11/site-packages/funannotate/library.py:1095: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  from pkg_resources import get_distribution
[Jan 07 03:25 PM]: Creating AGP file and corresponding contigs file
[Jan 07 03:25 PM]: Cross referencing SM cluster hits with MIBiG database version 1.4
[Jan 07 03:25 PM]: CMD ERROR: diamond blastp --sensitive --query ./funannotate/ppax/annotate_misc/antismash/smcluster.proteins.fasta --threads 28 --out ./funannotate/ppax/annotate_misc/antismash/smcluster.MIBiG.blast.txt --db /home/eggrandio/funannotate_db/mibig.dmnd --max-hsps 1 --evalue 0.001 --max-target-seqs 1 --outfmt 6
[Jan 07 03:25 PM]: diamond v2.1.16.170 (C) Max Planck Society for the Advancement of Science, Benjamin J. Buchfink, University of Tuebingen
Documentation, support and updates available at http://www.diamondsearch.org
Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

#CPU threads: 28
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory: ./funannotate/ppax/annotate_misc/antismash
#Target sequences to report alignments for: 1
Opening the database...  [0.045s]
Database: /home/eggrandio/funannotate_db/mibig.dmnd (type: Diamond database, sequences: 31023, letters: 18898150)
Block size = 2000000000
Opening the input file... Error: Error detecting input file format. Input file seems to be empty.

Any help would be appreciated.

Best,

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions