This repository contains several snakemake workflows used primarly to analyze the different aspects of the biology of transposable elements (TEs) in the genomes of killifish. In principle, it should work with any other eukaryote genome as well.
In general terms, the pipeline is able to:
- Quantify the abundance of TEs in genomes at different levels of classification (TE orders and superfamilies).
- Generate distributions of Kimura distances.
- Calculate the degree of fold-enrichment for specific TE superfamilies between species.
- Infer phylogenetic relationships between representative sequences of TE superfamilies.
- To perfom analysis of shared and unique TE superfamilies between species.
- To characterize TE superfamilies insertions regarding to gene regions (exons, introns, upstream, and downstream)
- To identify genes with tandem repeats of specific TE superfamilies and perfom functional enrichments of these genes.
The complete workflow is divided into modules whose input and output files are summarized in the following flow diagram:
The requirements will vary according to the module used, however there are a few mandatory requeriments:
Mandatory
- Snakemake
- Docker
- This container
- Bedtools
For Tandem analysis
- TRF
For phylogenetic analysis
- RAxML
The pipeline and its documentation are under continous development, so please don't hesitate in reporting any bug or problem that you could have.
Gajardo et al., (In preparation) A recent and rapid genome expansion driven by the amplification of transposable elements in the annual killifish Austrolebias charrua.
Morales et al., (In preparation) Genomic insights of the fish genus Orestias from the Andean Altiplano shed light on its evolutionary history and its phylogenetic placement within the Cyprinodontiformes order.