Skip to content

althonos/pyaragorn

Repository files navigation

👑 PyARAGORN Stars

Cython bindings and Python interface to ARAGORN, a (t|mt|tm)RNA gene finder.

Actions Coverage License PyPI Bioconda AUR Wheel Python Versions Python Implementations Source Mirror GitHub issues Docs Changelog Downloads

🗺️ Overview

ARAGORN is a fast method developed by Dean Laslett & Björn Canback[1] to identify tRNA and tmRNA genes in genomic sequences using heuristics to detect potential high-scoring stem-loop structures. The complementary method ARWEN, developed by the same authors[2] to support the detection of metazoan mitochondrial RNA (mtRNA) genes, was later integrated into ARAGORN.

pyaragorn is a Python module that provides bindings to ARAGORN and ARWEN using Cython. It directly interacts with the ARAGORN internals, which has the following advantages:

  • single dependency: PyARAGORN is distributed as a Python package, so you can add it as a dependency to your project, and stop worrying about the ARAGORN binary being present on the end-user machine.
  • no intermediate files: Everything happens in memory, in a Python object you fully control, so you don't have to invoke the ARAGORN CLI using a sub-process and temporary files. Sequences can be passed directly as strings, bytes, or any buffer objects, which avoids the overhead of formatting your input to FASTA for ARAGORN.
  • no output parsing: The detected RNA genes are returned as Python objects with transparent attributes, which facilitate handling the output of ARAGORN compared to parsing the output tables.
  • same results: PyARAGORN is tested to ensure it produces the same results as ARAGORN v1.2.41, the latest release.

📋 Features

PyARAGORN currently supports the following features from the ARAGORN command line:

  • tRNA gene detection (aragorn -t).
  • tmRNA gene detection (aragorn -m).
  • mtRNA gene detection (aragorn -mt).
  • Reporting of batch mode metadata (aragorn -w).
  • Alternative genetic code (aragorn -gc).
  • Custom genetic code (aragorn -gc<n>,BBB=<aa>).
  • Circular and linear topologies (aragorn -c | aragorn -l).
  • Intron length configuration (aragorn -i).
  • Scoring threshold configuration (aragorn -ps).
  • Sequence extraction from RNA gene (aragorn -seq).
  • Secondary structure extraction from each gene (aragorn -br).

🧶 Thread-safety

pyaragorn.RNAFinder instances are thread-safe. In addition, the find_rna method is re-entrant. This means you can parameterize a RNAFinder instance once, and then use a pool to process sequences in parallel:

import multiprocessing.pool
import pyaragorn

rna_finder = pyaragorn.RNAFinder()

with multiprocessing.pool.ThreadPool() as pool:
    predictions = pool.map(rna_finder.find_rna, sequences)

🔧 Installing

This project is supported on Python 3.7 and later.

PyARAGORN can be installed directly from PyPI, which hosts some pre-built wheels for the x86-64 architecture (Linux/MacOS/Windows) and the Aarch64 architecture (Linux/MacOS), as well as the code required to compile from source with Cython:

$ pip install pyaragorn

💡 Example

Let's load a sequence from a GenBank file, use a RNAFinder to find all the tRNA genes it contains, and print the anticodon and corresponding amino-acids of the detected tRNAs.

To use the RNAFinder to detect tRNA and tmRNA genes, the default operation mode, but using the bacterial genetic code (translation table 11):

import Bio.SeqIO
import pyaragorn

record = Bio.SeqIO.read("sequence.gbk", "genbank")

rna_finder = pyaragorn.RNAFinder(translation_table=11)
genes = rna_finder.find_rna(bytes(record.seq))

for gene in genes:
    if gene.type == "tRNA":
        print(
            gene.amino_acid,   # 3-letter code
            gene.begin,        # 1-based, inclusive
            gene.end,
            gene.strand,       # +1 or -1 for direct and reverse strand
            gene.energy,
            gene.anticodon
        )

On older versions of Biopython (before 1.79) you will need to use record.seq.encode() instead of bytes(record.seq).

💭 Feedback

⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

📋 Changelog

This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.

⚖️ License

This library is provided under the GNU General Public License v3.0 or later. ARAGORN and ARWEN were developed by Dean Laslett and are distributed under the terms of the GPLv3 or later as well. See vendor/aragorn for more information.

This project is in no way not affiliated, sponsored, or otherwise endorsed by the ARAGORN authors. It was developed by Martin Larralde during his PhD project at the Leiden University Medical Center in the Zeller Lab.

📚 References

  • [1] Laslett, Dean, and Bjorn Canback. “ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences.” Nucleic acids research vol. 32,1 11-6. 2 Jan. 2004, doi:10.1093/nar/gkh152
  • [2] Laslett, Dean, and Björn Canbäck. “ARWEN: a program to detect tRNA genes in metazoan mitochondrial nucleotide sequences.” Bioinformatics (Oxford, England) vol. 24,2 (2008): 172-5. doi:10.1093/bioinformatics/btm573

About

Cython bindings and Python interface to ARAGORN, a (t|mt|tm)RNA gene finder.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published