This pipeline is work-in-progress, you might fing bugs, some are known, while others remain undiscovered. Before getting desperate, please check out the Issues that are already opened and discussed. We encourage the community to contribute by reporting any issues they encounter on GitHub. Feel free to reach out to me via email (maria.schreiber@uni-jena.de) or open an issue directly. It's important to note that I cannot be held responsible for any results obtained using SweetSynteny or any conclusions drawn from them.
Microsynteny, the conservation of gene order and orientation within small genomic regions across different species, provides crucial insights into evolutionary relationships and functional conservation.
Key features of SweetSynteny:
- Flexible input:
- different number of organisms (from bacteria to eukaryotes)
- different searches (
cmsearch
for sRNA orblast
for protein)
- Sequence-driven clustering and color-pattern Microsynteny clustering
- on sequence / structur level (-> see Table):
mmseq easy lineclust
orcmscan
- on microsynteny level:
dendrogram
- on global level: umap comparing microsynteny cluster
- on sequence / structur level (-> see Table):
- Comprehensive results:
- phylogenetic trees using
dendrogram
build by scipy.cluster.hierarchy - statistical summaries
- microsynteny plots
- statistics on the similarity of the microsynteny locations, e.g. cosinus similarity
- Optional: get gene of interest sequence and its promoter sequence (default: 100 nt upstream)
- phylogenetic trees using
Conitig:Counter | Gene Name | Start | Stop | Strand | Bio_type | Color |
---|---|---|---|---|---|---|
NZ_CP013002.1:0 | gene-AQ619_RS00960 | 215167 | 216307 | sense | protein_coding | #FFFFFF |
So, as you can see, with SweetSynteny, your Microsynteny analysis will be, well... sweet!
The pipeline is written in Nextflow. In order to run SweetSynteny
, I recommend creating a conda environment dedicated for NextFlow.
- Install miniconda or conda
- Create a conda environment and install NextFlow within this environment and install everything else.
conda create -n nextflow -c bioconda nextflow conda activate nextflow conda install bioconda::infernal conda install bioconda::blast conda install bioconda::mmseq conda install -c conda-forge matplotlib pandas platformdirs pytest requests seaborn
- sugar
pip install dna_features_viewer pip install rnajena-sugar
- Clone the github repository for the latest version of
SweetSynteny
nextflow pull rnajena/SweetSynteny
- Done!
Let us briefly go over the most important parameters and options.
types infernal|blastn|blastp|tblastn
- For protein(s) we recommended a (m)fasta of amino acid sequences and tblastn
- For sRNA(s) we recommend a corresponding CM from RFAM or self-built\
- You have the choice
genomes_dir FOLDER
-
Please choose 2 or more genomes you want to search and save them here.
-
And use following structure:
└── genomes_dir
├── genome1_dir │ ├── db.gff │ └── db.fna ├── genome2_dir . ├── db.gff . └── db.fna
...
query .cm | .fna
- Path to CM or FASTA of the gen of interest
output_dir FOLDER
- Path to output folder
gene_of_interest string
- Name of the gene of interest
cluster_level sequence_level | structur_level
- Chose clustering for sRNAs
neighbours x:y | x-y
- Set numbers of neighbours (:) or number of nucleotides (-)
- x and y should be Integer numbers
scale yes | no
- Chose if you want to scale the microsynteny plots
plotting png | svg
- Select which output format you prefer for the microsynteny plots
cluster >2
- Chose minimal cluster size for
DBscan
clustering
threshold 0-1
- Select a similarity threshold for clustering
See example para.json
nextflow run SweetSynteny.nf -params-file /SweetSynteny/para.json
Click here for all citations
-
BLAST:
Korf, Ian, Mark Yandell, and Joseph Bedell. Blast. " O'Reilly Media, Inc.", 2003.
-
INFERNAL:
Nawrocki, Eric P., Diana L. Kolbe, and Sean R. Eddy. "Infernal 1.0: inference of RNA alignments." Bioinformatics 25.10 (2009): 1335-1337.
-
MMSeqs2:
Steinegger, M., Söding, J. "MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets". Nat Biotechnol 35, 1026–1028 (2017)
-
ETE3:
Huerta-Cepas, Jaime, François Serra, and Peer Bork. "ETE 3: reconstruction, analysis, and visualization of phylogenomic data." Molecular biology and evolution 33.6 (2016): 1635-1638.
-
DNA Features Viewer
Edinburgh Genome Foundry by Zulko. https://github.com/Edinburgh-Genome-Foundry/DnaFeaturesViewer
If you use SweetSynteny for your analysis, please cite our github repository.
@software{Maria_Schreiber_SweetSynteny,
author = {Maria Schreiber, Emanuel Barth, Manja Marz},
license = {MIT},
title = {{SweetSynteny}},
url = {https://github.com/rnajena/SweetSynteny}
}