Cython bindings and Python interface to ARAGORN, a (t|mt|tm)RNA gene finder.
ARAGORN is a fast method developed by Dean Laslett & Björn Canback[1] to identify tRNA and tmRNA genes in genomic sequences using heuristics to detect potential high-scoring stem-loop structures. The complementary method ARWEN, developed by the same authors[2] to support the detection of metazoan mitochondrial RNA (mtRNA) genes, was later integrated into ARAGORN.
pyaragorn
is a Python module that provides bindings to ARAGORN and ARWEN
using Cython. It directly interacts with the
ARAGORN internals, which has the following advantages:
- single dependency: PyARAGORN is distributed as a Python package, so you can add it as a dependency to your project, and stop worrying about the ARAGORN binary being present on the end-user machine.
- no intermediate files: Everything happens in memory, in a Python object you fully control, so you don't have to invoke the ARAGORN CLI using a sub-process and temporary files. Sequences can be passed directly as strings, bytes, or any buffer objects, which avoids the overhead of formatting your input to FASTA for ARAGORN.
- no output parsing: The detected RNA genes are returned as Python objects with transparent attributes, which facilitate handling the output of ARAGORN compared to parsing the output tables.
- same results: PyARAGORN is tested to ensure it produces the same results
as ARAGORN
v1.2.41
, the latest release.
PyARAGORN currently supports the following features from the ARAGORN command line:
- tRNA gene detection (
aragorn -t
). - tmRNA gene detection (
aragorn -m
). - mtRNA gene detection (
aragorn -mt
). - Reporting of batch mode metadata (
aragorn -w
). - Alternative genetic code (
aragorn -gc
). - Custom genetic code (
aragorn -gc<n>,BBB=<aa>
). - Circular and linear topologies (
aragorn -c
|aragorn -l
). - Intron length configuration (
aragorn -i
). - Scoring threshold configuration (
aragorn -ps
). - Sequence extraction from RNA gene (
aragorn -seq
). - Secondary structure extraction from each gene (
aragorn -br
).
pyaragorn.RNAFinder
instances are thread-safe. In addition, the find_rna
method is re-entrant. This means you can parameterize a RNAFinder
instance
once, and then use a pool to process sequences in parallel:
import multiprocessing.pool
import pyaragorn
rna_finder = pyaragorn.RNAFinder()
with multiprocessing.pool.ThreadPool() as pool:
predictions = pool.map(rna_finder.find_rna, sequences)
This project is supported on Python 3.7 and later.
PyARAGORN can be installed directly from PyPI, which hosts some pre-built wheels for the x86-64 architecture (Linux/MacOS/Windows) and the Aarch64 architecture (Linux/MacOS), as well as the code required to compile from source with Cython:
$ pip install pyaragorn
Let's load a sequence from a
GenBank file,
use a RNAFinder
to find all the tRNA genes it contains,
and print the anticodon and corresponding amino-acids of the detected
tRNAs.
To use the RNAFinder
to detect tRNA and tmRNA genes, the default operation
mode, but using the bacterial genetic code (translation table 11):
import Bio.SeqIO
import pyaragorn
record = Bio.SeqIO.read("sequence.gbk", "genbank")
rna_finder = pyaragorn.RNAFinder(translation_table=11)
genes = rna_finder.find_rna(bytes(record.seq))
for gene in genes:
if gene.type == "tRNA":
print(
gene.amino_acid, # 3-letter code
gene.begin, # 1-based, inclusive
gene.end,
gene.strand, # +1 or -1 for direct and reverse strand
gene.energy,
gene.anticodon
)
On older versions of Biopython (before 1.79) you will need to use
record.seq.encode()
instead of bytes(record.seq)
.
Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.
This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.
This library is provided under the GNU General Public License v3.0 or later.
ARAGORN and ARWEN were developed by Dean Laslett and are distributed under the
terms of the GPLv3 or later as well. See vendor/aragorn
for more information.
This project is in no way not affiliated, sponsored, or otherwise endorsed by the ARAGORN authors. It was developed by Martin Larralde during his PhD project at the Leiden University Medical Center in the Zeller Lab.
- [1] Laslett, Dean, and Bjorn Canback. “ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences.” Nucleic acids research vol. 32,1 11-6. 2 Jan. 2004, doi:10.1093/nar/gkh152
- [2] Laslett, Dean, and Björn Canbäck. “ARWEN: a program to detect tRNA genes in metazoan mitochondrial nucleotide sequences.” Bioinformatics (Oxford, England) vol. 24,2 (2008): 172-5. doi:10.1093/bioinformatics/btm573