environiment: yaml.enrichment
aim: gain functional insight on the genes associated to crema - vicia interactions.
GO annotation has been carried out using
-
PANNZER using no filtering and defeault parameters
-
EggNOG-mapper using Insecta and Viridiplantae as Taxonomic scope and leveraging all annotations (including electronic ones).
PANNZER assigned 41331 and 99382 GO-terms respectively to crema and vicia (to 7930 and 18695 transcripts).
EggNOG-mapper assigned 391736 and 690388 GO-terms respectively to crema and vicia (to 5061 and 11897 transcripts).
Subsequent enrichment analysis legerage topGO and for this purpose the PANNZER and EggNOG-mapper outputs need to be reformatted using:
sh scripts/reformat_GOs.sh
enrichment/PANNZER_crema/GO_crema_predictions.lst
enrichment/eggNOG_crema/crema_eggNOG-mapper_annotations.tsv
enrichment/GO_crema/GO_crema_geneUniverse
and
sh scripts/reformat_GOs.sh
enrichment/PANNZER_vicia/GO_vicia_predictions.lst
enrichment/eggNOG_vicia/vicia_eggNOG-mapper_annotations.tsv
enrichment/GO_vicia/GO_vicia_geneUniverse
This script take as input:
-
PANNZER output
-
eggNOGmapper output
-
the output file name
In total we annotated with GOterms 8615 and 18698 transcripts for crema and vicia.
Since crema expression data encompasses two "tissues" the correct approach to generate an enrichment is to use as background only the genes which are found to be expressed in that particular tissue.
Thus for crema we need beforehand to create a tissue-specific gene universes, respectively for AD and CT:
for i in $(awk '{print $1}' abundances/crema/crema_deseq2_gene/RSEM_crema.filtered.gene.counts.matrix.A_AD_vs_*_AD.DESeq2.DE_results |
sort -u | grep TRINITY); do grep $i enrichment/GO_crema/GO_crema_geneUniverse; done > enrichment/GO_crema/GO_crema_geneUniverse_AD
for i in $(awk '{print $1}' abundances/crema/crema_deseq2_gene/RSEM_crema.filtered.gene.counts.matrix.A_CT_vs_*_CT.DESeq2.DE_results |
sort -u | grep TRINITY); do grep $i enrichment/GO_crema/GO_crema_geneUniverse; done > enrichment/GO_crema/GO_crema_geneUniverse_CT
Then we build two folders to split the DE results:
cd red/paint/abundances/crema/crema_deseq2_gene
mkdir AD CT
cp crema_A_AD_vs* AD/
cp crema_A_CT_vs* CT/
Then the GO enrichment analysisis is performed using the following commands:
Rscript scripts/GO_enrichment.Rscript enrichment/GO_vicia/GO_vicia_geneUniverse
abundances/vicia/vicia_deseq_gene/ BP 5 classic fisher 0.01 enrichment/GO_vicia/
Rscript scripts/GO_enrichment.Rscript enrichment/GO_crema/GO_crema_geneUniverse_CT
abundances/crema/crema_deseq2_gene/CT/ BP 5 classic fisher 0.01 enrichment/GO_crema/
Rscript scripts/GO_enrichment.Rscript enrichment/GO_crema/GO_crema_geneUniverse_AD
abundances/crema/crema_deseq2_gene/AD/ BP 5 classic fisher 0.01 enrichment/GO_crema/
The Rscript has several positional arguments:
- the gene universe
- the input folder containing the genes subsets considered
- the onyology considered (BP/MF/CC)
- the minimum node size
- the algorythm
- the statistical test
- the p value cutoff
Raw outputs are available in:
enrichment/GO_vicia/*tsv
here
enrichment/GO_crema/*tsv
here
EggNOG annotations for DE genes were extracted using:
for i in $(ls abundances/crema/crema_deseq2_gene/crema_*.lst); do
basename=$(echo $i | awk -F "/" '{print $NF}' | awk -F "\." '{print $1}');
sh scripts/extract_eggNOG_annotations.sh $i enrichment/eggNOG_crema/crema_eggNOG-mapper_annotations.tsv
> enrichment/eggNOG_crema/"$basename"_eggNOG_annotations.tsv; done
for i in $(ls abundances/vicia/vicia_deseq2_gene/vicia_*.lst); do
basename=$(echo $i | awk -F "/" '{print $NF}' | awk -F "\." '{print $1}');
sh scripts/extract_eggNOG_annotations.sh $i enrichment/eggNOG_vicia/vicia_eggNOG-mapper_annotations.tsv
> enrichment/eggNOG_vicia/"$basename"_eggNOG_annotations.tsv; done
Raw outputs are available in:
enrichment/eggNOG_vicia/*tsv
here
enrichment/eggNOG_crema/*tsv
here
GO enrichment analyses were plotted using the Rscript plot_GO_enrichment.Rscript
.
NB: this part has been carried out from any environiment due to conflicts.
The Rscript positional arguments:
- enriched goterms alog with their p value /
red_enrichment.tsv
files generated byGO_enrichment.Rscript
- similarity calculation method
For vicia:
Rscript scripts/plot_GO_enrichment.Rscript enrichment/GO_vicia/vicia_UP_BP_enrichment.tsv "orange" 0.01
Rscript scripts/plot_GO_enrichment.Rscript enrichment/GO_vicia/vicia_DN_BP_enrichment.tsv "lightblue" 0.01
For crema:
for i in enrichment/GO_crema/*_UP_BP_enrichment.tsv;
do Rscript plot_GO_enrichment.R $i "orange" 0.05;
done
for i in enrichment/GO_crema/*_DN_BP_enrichment.tsv;
do Rscript plot_GO_enrichment.R $i "lightblue" 0.05;
done
The full list can be found here - WIP.
Then the genes associated to GOterms of interest were extracted; here are some examples for:
- vicia downregulated genes associated to GO:0042136 - neurotransmitter biosynthetic process:
Rscript scripts/GO_extract.Rscript enrichment/GO_vicia/enrichment.Rdata
GO:0042136 abundances/vicia/vicia_deseq_gene/vicia_DN_genes.lst
[1] "TRINITY_DN36133_c0_g1" "TRINITY_DN833_c0_g1"
Transcripts sequences were extracted as amminoacids sequences using:
for i in $(cat crema/abundances/crema/crema_deseq2_gene/crema_A_CT_vs_B_CT_UP_genes.lst); do
sed -n -e "/$i/,/TRINITY/ p" annotations/crema/crema.Trinity.fasta.transdecoder.pep | head -n -1 | awk '{print $1}';
done
Additional KO annotation has been carried out using BlastKOALA leveraging the correct Taxonomy ID and the family_eukaryotes database. BlastKOALA assigned XXX and YYY KO-terms respectively to crema and vicia.
To further confirm the functional meaning of the observed transcriptional responses, we also leveraged th KO annotation generated by BlastKOALA.
grep -f abundances/vicia/vicia_deseq_gene/vicia_UP_genes.lst enrichment/blastKOALA_vicia/vicia_family-eukaryotes_blastKOALA.lst | grep K > enrichment/KO_vicia/vicia_UP_genes
grep -f abundances/vicia/vicia_deseq_gene/vicia_DN_genes.lst enrichment/blastKOALA_vicia/vicia_family-eukaryotes_blastKOALA.lst | grep K > enrichment/KO_vicia/vicia_DN_genes
and
for i in abundances/crema/crema_deseq2_gene/*A_*_vs_*lst;
do
name=$(echo ${i::-10} | awk -F "/" '{print $NF}');
grep -f $i enrichment/blastKOALA_crema/crema_family-eukaryotes_blastKOALA.lst | grep K > enrichment/KO_crema/$name;
done
Raw outputs are available in:
enrichment/KO_vicia/*tsv
here
enrichment/KO_crema/*tsv
here
For crema a signalp analysis was carried out and then the number of protein with a secretion peptide was assigned to each module with:
for i in abundances/crema/crema_deseq2_gene/*A_*_vs_*lst;
do
secreted=$(grep -f $i enrichment/crema_signlp/crema_transdecoder_signalp.out | grep -c -w SP);
total=$(wc -l $i | awk '{print $1}');
genes=$(grep -f $i enrichment/crema_signlp/crema_transdecoder_signalp.out | grep -w SP | awk -F '.' '{print $1}' | tr '\n\t\r ' '\t');
echo -e "$i\t$secreted\t$total\t$genes";
done > single_genes_fun/crema_signalp.lst
The associated otput can be found in the folder single_genes_fun
here.
Furthermore, for crema we manually characterised:
- venom-associate proteins
- antimicrobial peptides
- aggression-related genes
- genes involved in dopamine methabolism
Databases were respectively:
- one based on VenomZone (insecta)
- one using "taxonomy:"Insecta [50557]" keyword:"Antimicrobial [KW-0929]" AND reviewed:yes" on Uniprot
- one manually curated based on the relevant litterature
- one manually curated for genes involved in dopamine methabolism
Only the workflow for the antimicrobial peptides will be shown, as others are identical. After building the blastp db using:
makeblastdb -in antimicrobial_peptides.fasta -dbtype prot
blastp searches were carried out, using:
blastp -query annotations/crema/crema.Trinity.fasta.transdecoder.pep
-db enrichment/crema_antimb/antimicrobial_peptides.fasta
-evalue 1e-15 -qcov_hsp_perc 80 -outfmt "6 qseqid sseqid evalue qcovs"
-max_target_seqs 1 -max_hsps 1 > enrichment/crema_antimb/crema_antimb_blastp.out
Subsequently, the DE statistics for
(a) the comparison between "control" and "treated" conditions and
(b) body parts
can be obtained respectively with:
sh scripts/extract_exp_val_crema_tissue_contrast.sh enrichment/crema_antimb/crema_antimb_blastp.out
> enrichment/crema_antimb/crema_antimb_expression_tissue_contrast.tsv
sh scripts/extract_exp_val_crema_treatm_contrast.sh enrichment/crema_antimb/crema_antimb_blastp.out
> enrichment/crema_antimb/crema_antimb_expression_treatm_contrast.tsv
The associated otputs can be found:
- here for venom-associated proteins
- here for antimicrobial peptides
- here for aggression-related genes
- here for genes involved in dopamine methabolism
Last, but not least, manually curated description of genes can be found here for crema and here for vicia.