Skip to content

rnajena/SweetSynteny

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WORK IN PROGRESS

This pipeline is work-in-progress, you might fing bugs, some are known, while others remain undiscovered. Before getting desperate, please check out the Issues that are already opened and discussed. We encourage the community to contribute by reporting any issues they encounter on GitHub. Feel free to reach out to me via email (maria.schreiber@uni-jena.de) or open an issue directly. It's important to note that I cannot be held responsible for any results obtained using SweetSynteny or any conclusions drawn from them.


SweetSynteny - Unraveling Microsynteny Patterns

Microsynteny, the conservation of gene order and orientation within small genomic regions across different species, provides crucial insights into evolutionary relationships and functional conservation.

Key features of SweetSynteny:

  • Flexible input:
    1. different number of organisms (from bacteria to eukaryotes)
    2. different searches (cmsearch for sRNA or blast for protein)
  • Sequence-driven clustering and color-pattern Microsynteny clustering
    1. on sequence / structur level (-> see Table): mmseq easy lineclust or cmscan
    2. on microsynteny level: dendrogram
    3. on global level: umap comparing microsynteny cluster
  • Comprehensive results:
    1. phylogenetic trees using dendrogram build by scipy.cluster.hierarchy
    2. statistical summaries
    3. microsynteny plots
    4. statistics on the similarity of the microsynteny locations, e.g. cosinus similarity
    5. Optional: get gene of interest sequence and its promoter sequence (default: 100 nt upstream)
Conitig:Counter Gene Name Start Stop Strand Bio_type Color
NZ_CP013002.1:0 gene-AQ619_RS00960 215167 216307 sense protein_coding #FFFFFF

So, as you can see, with SweetSynteny, your Microsynteny analysis will be, well... sweet!


Graphical Workflow

Workflow graph

Dependencies and installation

The pipeline is written in Nextflow. In order to run SweetSynteny, I recommend creating a conda environment dedicated for NextFlow.

  1. Install miniconda or conda
  2. Create a conda environment and install NextFlow within this environment and install everything else.
    conda create -n nextflow -c bioconda nextflow
    conda activate nextflow
    conda install bioconda::infernal
    conda install bioconda::blast
    conda install bioconda::mmseq
    conda install -c conda-forge matplotlib pandas platformdirs pytest requests seaborn
  3. sugar
    pip install dna_features_viewer
    pip install rnajena-sugar
    
  4. Clone the github repository for the latest version of SweetSynteny
    nextflow pull rnajena/SweetSynteny
  5. Done!

Usage

Let us briefly go over the most important parameters and options.

types infernal|blastn|blastp|tblastn

  • For protein(s) we recommended a (m)fasta of amino acid sequences and tblastn
  • For sRNA(s) we recommend a corresponding CM from RFAM or self-built\
  • You have the choice

genomes_dir FOLDER

  • Please choose 2 or more genomes you want to search and save them here.

  • And use following structure:

    └── genomes_dir

      ├── genome1_dir
    
      │    ├── db.gff
    
      │    └── db.fna
    
      ├── genome2_dir
    
      .    ├── db.gff
    
      .    └── db.fna
    

    ...

query .cm | .fna

  • Path to CM or FASTA of the gen of interest

output_dir FOLDER

  • Path to output folder

gene_of_interest string

  • Name of the gene of interest

cluster_level sequence_level | structur_level

  • Chose clustering for sRNAs

neighbours x:y | x-y

  • Set numbers of neighbours (:) or number of nucleotides (-)
  • x and y should be Integer numbers

scale yes | no

  • Chose if you want to scale the microsynteny plots

plotting png | svg

  • Select which output format you prefer for the microsynteny plots

cluster >2

  • Chose minimal cluster size for DBscan clustering

threshold 0-1

  • Select a similarity threshold for clustering

Use a config file.

See example para.json

Running the pipeline

nextflow run SweetSynteny.nf -params-file /SweetSynteny/para.json

Other tools

Click here for all citations
  • BLAST:

    • Korf, Ian, Mark Yandell, and Joseph Bedell. Blast. " O'Reilly Media, Inc.", 2003.
  • INFERNAL:

    • Nawrocki, Eric P., Diana L. Kolbe, and Sean R. Eddy. "Infernal 1.0: inference of RNA alignments." Bioinformatics 25.10 (2009): 1335-1337.
  • MMSeqs2:

    • Steinegger, M., Söding, J. "MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets". Nat Biotechnol 35, 1026–1028 (2017)
  • ETE3:

    • Huerta-Cepas, Jaime, François Serra, and Peer Bork. "ETE 3: reconstruction, analysis, and visualization of phylogenomic data." Molecular biology and evolution 33.6 (2016): 1635-1638.
  • DNA Features Viewer

    • Edinburgh Genome Foundry by Zulko. https://github.com/Edinburgh-Genome-Foundry/DnaFeaturesViewer

Cite us

If you use SweetSynteny for your analysis, please cite our github repository.

@software{Maria_Schreiber_SweetSynteny,
author = {Maria Schreiber, Emanuel Barth, Manja Marz},
license = {MIT},
title = {{SweetSynteny}},
url = {https://github.com/rnajena/SweetSynteny}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published