GitHub - zhangrengang/evolution_example: Example for evlutionary analyses with Orthology Index

$ tree
├── all_species_gene.gff        # all GFF
├── cds.fa                      # all CDS sequences
├── chr.list					$ all chromosome information
├── pep.faa                     # all protein sequences
├── species.design              # speceis list
├── OrthoFinder                 # input for OrthoFinder
│   ├── Angelica_sinensis.fasta
│   ├── Apium_graveolens.fasta
│   ├── ......
└── wgdi                        # input for WGDI
    ├── Angelica_sinensis-Angelica_sinensis.blast
    ├── Angelica_sinensis-Angelica_sinensis.conf
    ├── Angelica_sinensis-Angelica_sinensis.ctl   # for `soi dotplot`
    ├── Angelica_sinensis.gff
    ├── Angelica_sinensis.lens
    ├── ......
 ......

Note: the GENE ID is needed to label with SPECIES ID (e.g. Angelica_sinensis|AS01G00001) for uniqueness and compatibility (legacy from OrthoMCL). The CHROMosome ID should also be unique (e.g. As1, As2) to avoid conflicts (legacy from MCscanX). The IDs can be easily labeled for your own data; for example:

SP=Angelica_sinensis
# using sed to label fasta files:
sed 's/>/>'$SP'|/' $SP.pep > $SP.fasta  # always using the separator '|'

SP2=As
# using awk and perl to label MCscanX/WGDI gff files or JCVI bed files:
cat $SP.gff0 | awk -v sp=$SP -v OFS="\t" '{$2=sp"|"$2;print $0}' | perl -pe 's/^\D+/'$SP2'/' > $SP.gff
cat $SP.bed0 | awk -v sp=$SP -v OFS="\t" '{$4=sp"|"$4;print $0}' | perl -pe 's/^\D+/'$SP2'/' > $SP.bed

SP and SP2 can be the same. It will be convenient to use a for/while loop to process all the species.

Installation

If you have installed OrthoIndex and activated the conda environment, all the commands used in this pipeline should have been installed and can be executed in the shell.

Run OrthoFinder for orthology

To infer 'orthology' using OrthoFinder2:

orthofinder -f OrthoFinder/ -M msa -t 60

orthofinder can by replaced by Proteinortho6, Broccoli and SonicParanoid2.

Run WGDI for synteny

This repo has provided all input files (*.lens, *.gff, *.blast, *.conf, *.ctl) for the downstream pipeline. But for your own data, these input files can be generated by the following commands:

indir=.
outdir=wgdi
sdesign=species.design

# to generate *.lens and *.gff files for `wgdi`, and *.ctl files for `soi dotplot`
soi-syn to_wgdi indir=$indir outdir=$outdir species=$sdesign gff=all_species_gene.gff chrLst=chr.list min_genes=100
# *.blast files for `wgdi`
soi-orth to_wgdi $indir/OrthoFinder/OrthoFinder/Results_* species=$sdesign outdir=$outdir

cd $outdir
chrl=../chr.list
../src/comb2 `cat ../species.design` | while read LINE
do
	arr=($LINE)
	SP1=${arr[0]}	# y
	SP2=${arr[1]}	# x
	conf=$SP1-$SP2.conf
	ctl=$SP1-$SP2.ctl
	# *.conf file for `wgdi`
	sh ../src/wgdi-conf.sh $SP1 $SP2 > $conf
#	# ctl file for `soi dotplot`
#	soi-ctl by_genes $chrl $SP1 $SP2 > $ctl
done
cd -

To detect 'synteny' by WGDI, with visualization by SOI :

cd wgdi

../src/comb2 `cat ../species.design` | while read LINE
do
    arr=($LINE)
    SP1=${arr[0]}
    SP2=${arr[1]}
    prefix=$SP1-$SP2
    conf=$prefix.conf

    # blast (can be skipped)
#    diamond blastp -q ../OrthoFinder/$SP1.fasta -d ../OrthoFinder/$SP2.fasta -o $prefix.blast --more-sensitive -p 10 --quiet -e 0.001

    # call synteny
    wgdi -icl $conf

    # dot plot colored by Orthology Index
    soi dotplot -s $prefix.collinearity \
        -g ../all_species_gene.gff -c $prefix.ctl  \
        --xlabel $SP1 --ylabel $SP2 \
        --ks-hist --max-ks 1 -o $prefix.io    \
        --plot-ploidy --gene-axis --number-plots \
        --ofdir ../OrthoFinder/OrthoFinder/Results_*/ \
        --of-color

    # to show only orthology
    soi dotplot -s $prefix.collinearity \
        -g ../all_species_gene.gff -c $prefix.ctl  \
        --xlabel $SP1 --ylabel $SP2 \
        --ks-hist --max-ks 1 -o $prefix.io    \
        --plot-ploidy --gene-axis --number-plots \
        --ofdir ../OrthoFinder/OrthoFinder/Results_*/ \
        --of-color --of-ratio 0.6

done

cd ..

If you also need Ks-based visualization:

cd wgdi

../src/comb2 `cat ../species.design` | while read LINE
do
    arr=($LINE)
    SP1=${arr[0]}
    SP2=${arr[1]}
    prefix=$SP1-$SP2
    conf=$prefix.conf

    # calculate Ks
    wgdi -ks $conf

    # dot plot colored by Ks
    soi dotplot -s $prefix.collinearity \
        -g ../all_species_gene.gff -c $prefix.ctl  \
        --kaks $prefix.collinearity.ks \
        --xlabel $SP1 --ylabel $SP2 \
        --ks-hist --max-ks 1.5 -o $prefix     \
        --plot-ploidy --gene-axis --number-plots

    # to show only orthology
    soi dotplot -s $prefix.collinearity \
        -g ../all_species_gene.gff -c $prefix.ctl  \
        --kaks $prefix.collinearity.ks \
        --xlabel $SP1 --ylabel $SP2 \
        --ks-hist --max-ks 1.5 -o $prefix     \
        --plot-ploidy --gene-axis --number-plots \
        --ofdir ../OrthoFinder/OrthoFinder/Results_*/ \
        --of-ratio 0.6       # filtering by OI

done

cd ..

wgdi can be replaced by JCVI and MCscanX.

Run SOI for orthologous synteny and species tree

To cluster syntenic orthogroups (SOGs) and construct phylogenomic analyses:

mkdir -p phylogenomics && cd phylogenomics

# to filter collinearity
soi filter -s ../wgdi/*.collinearity -o ../OrthoFinder/OrthoFinder/Results_*/ -c 0.6 -stat collinearity.ortho.stats > collinearity.ortho

# to cluster SOGs excluding outgroups that do not share the lineage-specific WGD
soi cluster -s collinearity.ortho -outgroup Lonicera_japonica Ilex_polyneura Vitis_vinifera -prefix cluster
# to add outgroups
soi outgroup -s collinearity.ortho -og cluster.mcl -outgroup Lonicera_japonica Ilex_polyneura Vitis_vinifera > cluster.mcl.plus

# to build multi-copy or single-copy gene trees
soi phylo -og cluster.mcl.plus -pep ../pep.faa -cds ../cds.fa -both -pre sog -fast -mm 0.4 -p 80 -tmp tmp.mc.0.4
soi phylo -og cluster.mcl.plus -pep ../pep.faa -cds ../cds.fa -both -pre sog -fast -mm 0.2 -p 80 -tmp tmp.sc.0.2 -sc -concat -trimal_opts " -gappyout" -iqtree_opts " -B 1000"

# to infer coalescent‐based species tree
astral-pro --root Vitis_vinifera sog.mc.cds.mm0.4.genetrees > sog.mc.cds.mm0.4.genetrees.astral
astral-hybrid --root Vitis_vinifera sog.sc.cds.mm0.2.genetrees > sog.sc.cds.mm0.2.genetrees.astral

# to infer concatenation‐based species tree
iqtree2 -s sog.sc.cds.mm0.2.concat.aln -T 60 -B 1000 -mset GTR -o Vitis_vinifera

Note: although we set a unified cutoff (0.6) in the pipeline, users should manually check the resulted dot plots for confirmation, and some extremely complex cases showing unexpected patterns need to be investigated on a case-by-case basis.

If all species share the WGD(s), -outgroup should not be used, and just run soi cluster and skip soi outgroup:

soi cluster -s collinearity.ortho -prefix cluster

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github/workflows		.github/workflows
CDS		CDS
OrthoFinder		OrthoFinder
src		src
wgdi		wgdi
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
chr.list		chr.list
pipeline.sh		pipeline.sh
species.design		species.design

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Table of Contents

Prepare data

Installation

Run OrthoFinder for orthology

Run WGDI for synteny

Run SOI for orthologous synteny and species tree

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

zhangrengang/evolution_example

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Prepare data

Installation

Run OrthoFinder for orthology

Run WGDI for synteny

Run SOI for orthologous synteny and species tree

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages