A Pyrodigal extension to predict genes in giant viruses and viruses with alternative genetic code.
Pyrodigal is a Python module that provides Cython bindings to Prodigal, an efficient gene finding method for genomes and metagenomes based on dynamic programming.
pyrodigal-gv
is a small extension module for pyrodigal
which distributes
additional metagenomic models for giant viruses and viruses that use
alternative genetic codes, first provided by Antônio Camargo
in prodigal-gv
. The new models
are the following:
- Acanthamoeba polyphaga mimivirus
- Paramecium bursaria Chlorella virus
- Acanthocystis turfacea Chlorella virus
- VirSorter2's NCLDV gene model
- Topaz (genetic code 15)
- Agate (genetic code 15)
- Gut phages (genetic code 15)
- Gut phages (genetic code 11) × 5
pyrodigal-gv
can be installed directly from PyPI
as a universal wheel that contains all required data files:
$ pip install pyrodigal-gv
Just use the provided ViralGeneFinder
class instead of the usual GeneFinder
from pyrodigal
, and the new viral models will be used automatically in
meta mode:
import Bio.SeqIO
import pyrodigal_gv
record = Bio.SeqIO.read("sequence.gbk", "genbank")
orf_finder = pyrodigal_gv.ViralGeneFinder(meta=True)
for i, pred in enumerate(orf_finder.find_genes(bytes(record.seq))):
print(f">{record.id}_{i+1}")
print(pred.translate())
ViralGeneFinder
has an additional keyword argument, viral_only
, which can
be set to True
to run gene calling using only viral models.
pyrodigal-gv
comes with a very simple command line similar to Prodigal and pyrodigal
:
$ pyrodigal-gv -i <input_file.fasta> -a <gene_translations.fasta> -d <gene_sequences.fasta>
Contrary to prodigal
and pyrodigal
, the pyrodigal-gv
script runs in meta mode
by default! Running in single mode can be done with pyrodigal-gv -p single
but
the results will be exactly the same as pyrodigal
, so why would you ever do this
If you use the features provided by pyrodigal-gv
beyond the base Pyrodigal functionality, please cite the original manuscript detailing these extensions:
Camargo, A. P., Roux, S., Schulz, F., Babinski, M., Xu, Y., Hu, B., ... and Kyrpides, N. C. (2023). Identification of mobile genetic elements with geNomad. Nature Biotechnology, 1-10.
Pyrodigal is scientific software, with a published paper in the Journal of Open-Source Software. Please cite both Pyrodigal and Prodigal if you are using it in an academic work, for instance as:
Pyrodigal (Larralde, 2022), a Python library binding to Prodigal (Hyatt et al., 2010).
Detailed references are available on the Publications page of the online documentation.
Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.
Contributions are more than welcome! See
CONTRIBUTING.md
for more details.
This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.
This library is provided under the GNU General Public License v3.0.
The Prodigal code was written by Doug Hyatt and is distributed under the
terms of the GPLv3 as well. See vendor/Prodigal/LICENSE
for more information.
The giant virus and alternative genetic code virus parameters were created
by Antônio Camargo.
This project is in no way not affiliated, sponsored, or otherwise endorsed by the original Prodigal authors. It was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.