genes functional annotation

environiment: yaml.enrichment

aim: gain functional insight on the genes associated to crema - vicia interactions.

GO annotation has been carried out using

PANNZER using no filtering and defeault parameters
EggNOG-mapper using Insecta and Viridiplantae as Taxonomic scope and leveraging all annotations (including electronic ones).

PANNZER assigned 41331 and 99382 GO-terms respectively to crema and vicia (to 7930 and 18695 transcripts).

EggNOG-mapper assigned 391736 and 690388 GO-terms respectively to crema and vicia (to 5061 and 11897 transcripts).

Subsequent enrichment analysis legerage topGO and for this purpose the PANNZER and EggNOG-mapper outputs need to be reformatted using:

sh scripts/reformat_GOs.sh 
enrichment/PANNZER_crema/GO_crema_predictions.lst 
enrichment/eggNOG_crema/crema_eggNOG-mapper_annotations.tsv 
enrichment/GO_crema/GO_crema_geneUniverse

and

sh scripts/reformat_GOs.sh 
enrichment/PANNZER_vicia/GO_vicia_predictions.lst 
enrichment/eggNOG_vicia/vicia_eggNOG-mapper_annotations.tsv 
enrichment/GO_vicia/GO_vicia_geneUniverse

This script take as input:

PANNZER output
eggNOGmapper output
the output file name

In total we annotated with GOterms 8615 and 18698 transcripts for crema and vicia.

Since crema expression data encompasses two "tissues" the correct approach to generate an enrichment is to use as background only the genes which are found to be expressed in that particular tissue.

Thus for crema we need beforehand to create a tissue-specific gene universes, respectively for AD and CT:

for i in $(awk '{print $1}' abundances/crema/crema_deseq2_gene/RSEM_crema.filtered.gene.counts.matrix.A_AD_vs_*_AD.DESeq2.DE_results | 
sort -u | grep TRINITY); do grep $i enrichment/GO_crema/GO_crema_geneUniverse; done > enrichment/GO_crema/GO_crema_geneUniverse_AD

for i in $(awk '{print $1}' abundances/crema/crema_deseq2_gene/RSEM_crema.filtered.gene.counts.matrix.A_CT_vs_*_CT.DESeq2.DE_results | 
sort -u | grep TRINITY); do grep $i enrichment/GO_crema/GO_crema_geneUniverse; done > enrichment/GO_crema/GO_crema_geneUniverse_CT

Then we build two folders to split the DE results:

cd red/paint/abundances/crema/crema_deseq2_gene
mkdir AD CT
cp crema_A_AD_vs* AD/
cp crema_A_CT_vs* CT/

Then the GO enrichment analysisis is performed using the following commands:

Rscript scripts/GO_enrichment.Rscript enrichment/GO_vicia/GO_vicia_geneUniverse 
abundances/vicia/vicia_deseq_gene/ BP 5 classic fisher 0.01 enrichment/GO_vicia/

Rscript scripts/GO_enrichment.Rscript enrichment/GO_crema/GO_crema_geneUniverse_CT 
abundances/crema/crema_deseq2_gene/CT/ BP 5 classic fisher 0.01 enrichment/GO_crema/

Rscript scripts/GO_enrichment.Rscript enrichment/GO_crema/GO_crema_geneUniverse_AD 
abundances/crema/crema_deseq2_gene/AD/ BP 5 classic fisher 0.01 enrichment/GO_crema/

The Rscript has several positional arguments:

the gene universe
the input folder containing the genes subsets considered
the onyology considered (BP/MF/CC)
the minimum node size
the algorythm
the statistical test
the p value cutoff

Raw outputs are available in:

enrichment/GO_vicia/*tsv here

enrichment/GO_crema/*tsv here

EggNOG annotations for DE genes were extracted using:

for i in $(ls abundances/crema/crema_deseq2_gene/crema_*.lst); do 
basename=$(echo $i | awk -F "/" '{print $NF}' | awk -F "\." '{print $1}'); 
sh scripts/extract_eggNOG_annotations.sh $i  enrichment/eggNOG_crema/crema_eggNOG-mapper_annotations.tsv 
> enrichment/eggNOG_crema/"$basename"_eggNOG_annotations.tsv; done

for i in $(ls abundances/vicia/vicia_deseq2_gene/vicia_*.lst); do 
basename=$(echo $i | awk -F "/" '{print $NF}' | awk -F "\." '{print $1}'); 
sh scripts/extract_eggNOG_annotations.sh $i  enrichment/eggNOG_vicia/vicia_eggNOG-mapper_annotations.tsv 
> enrichment/eggNOG_vicia/"$basename"_eggNOG_annotations.tsv; done

Raw outputs are available in:

enrichment/eggNOG_vicia/*tsv here

enrichment/eggNOG_crema/*tsv here

GO enrichment analyses were plotted using the Rscript plot_GO_enrichment.Rscript.

NB: this part has been carried out from any environiment due to conflicts.

The Rscript positional arguments:

enriched goterms alog with their p value / red_enrichment.tsv files generated by GO_enrichment.Rscript
similarity calculation method

For vicia:

Rscript scripts/plot_GO_enrichment.Rscript enrichment/GO_vicia/vicia_UP_BP_enrichment.tsv "orange" 0.01

Rscript scripts/plot_GO_enrichment.Rscript enrichment/GO_vicia/vicia_DN_BP_enrichment.tsv "lightblue" 0.01

For crema:

for i in enrichment/GO_crema/*_UP_BP_enrichment.tsv; 
	do Rscript plot_GO_enrichment.R $i "orange" 0.05; 
	done

for i in enrichment/GO_crema/*_DN_BP_enrichment.tsv; 
	do Rscript plot_GO_enrichment.R $i "lightblue" 0.05; 
	done

The full list can be found here - WIP.

Then the genes associated to GOterms of interest were extracted; here are some examples for:

vicia downregulated genes associated to GO:0042136 - neurotransmitter biosynthetic process:

Rscript scripts/GO_extract.Rscript enrichment/GO_vicia/enrichment.Rdata 
GO:0042136 abundances/vicia/vicia_deseq_gene/vicia_DN_genes.lst

[1] "TRINITY_DN36133_c0_g1" "TRINITY_DN833_c0_g1"

Transcripts sequences were extracted as amminoacids sequences using:

for i in $(cat crema/abundances/crema/crema_deseq2_gene/crema_A_CT_vs_B_CT_UP_genes.lst); do 
sed -n -e "/$i/,/TRINITY/ p" annotations/crema/crema.Trinity.fasta.transdecoder.pep | head -n -1 | awk '{print $1}';
done

Additional KO annotation has been carried out using BlastKOALA leveraging the correct Taxonomy ID and the family_eukaryotes database. BlastKOALA assigned XXX and YYY KO-terms respectively to crema and vicia.

To further confirm the functional meaning of the observed transcriptional responses, we also leveraged th KO annotation generated by BlastKOALA.

grep -f abundances/vicia/vicia_deseq_gene/vicia_UP_genes.lst enrichment/blastKOALA_vicia/vicia_family-eukaryotes_blastKOALA.lst | grep K > enrichment/KO_vicia/vicia_UP_genes
grep -f abundances/vicia/vicia_deseq_gene/vicia_DN_genes.lst enrichment/blastKOALA_vicia/vicia_family-eukaryotes_blastKOALA.lst | grep K > enrichment/KO_vicia/vicia_DN_genes

and

for i in abundances/crema/crema_deseq2_gene/*A_*_vs_*lst; 
	do 
	name=$(echo ${i::-10} | awk -F "/" '{print $NF}'); 
	grep -f $i enrichment/blastKOALA_crema/crema_family-eukaryotes_blastKOALA.lst | grep K > enrichment/KO_crema/$name; 
	done

Raw outputs are available in:

enrichment/KO_vicia/*tsv here

enrichment/KO_crema/*tsv here

For crema a signalp analysis was carried out and then the number of protein with a secretion peptide was assigned to each module with:

for i in abundances/crema/crema_deseq2_gene/*A_*_vs_*lst; 
	do 
	secreted=$(grep -f $i enrichment/crema_signlp/crema_transdecoder_signalp.out | grep -c -w SP); 
	total=$(wc -l $i | awk '{print $1}'); 
	genes=$(grep -f $i enrichment/crema_signlp/crema_transdecoder_signalp.out | grep -w SP | awk -F '.' '{print $1}' | tr '\n\t\r ' '\t'); 
	echo -e "$i\t$secreted\t$total\t$genes"; 
	done > single_genes_fun/crema_signalp.lst

The associated otput can be found in the folder single_genes_fun here.

Furthermore, for crema we manually characterised:

venom-associate proteins
antimicrobial peptides
aggression-related genes
genes involved in dopamine methabolism

Databases were respectively:

one based on VenomZone (insecta)
one using "taxonomy:"Insecta [50557]" keyword:"Antimicrobial [KW-0929]" AND reviewed:yes" on Uniprot
one manually curated based on the relevant litterature
one manually curated for genes involved in dopamine methabolism

Only the workflow for the antimicrobial peptides will be shown, as others are identical. After building the blastp db using:

makeblastdb -in antimicrobial_peptides.fasta -dbtype prot

blastp searches were carried out, using:

blastp -query annotations/crema/crema.Trinity.fasta.transdecoder.pep 
-db enrichment/crema_antimb/antimicrobial_peptides.fasta 
-evalue 1e-15 -qcov_hsp_perc 80 -outfmt "6 qseqid sseqid evalue qcovs" 
-max_target_seqs 1 -max_hsps 1 > enrichment/crema_antimb/crema_antimb_blastp.out

Subsequently, the DE statistics for

(a) the comparison between "control" and "treated" conditions and

(b) body parts

can be obtained respectively with:

sh scripts/extract_exp_val_crema_tissue_contrast.sh  enrichment/crema_antimb/crema_antimb_blastp.out 
> enrichment/crema_antimb/crema_antimb_expression_tissue_contrast.tsv

sh scripts/extract_exp_val_crema_treatm_contrast.sh  enrichment/crema_antimb/crema_antimb_blastp.out 
> enrichment/crema_antimb/crema_antimb_expression_treatm_contrast.tsv

The associated otputs can be found:

here for venom-associated proteins
here for antimicrobial peptides
here for aggression-related genes
here for genes involved in dopamine methabolism

Last, but not least, manually curated description of genes can be found here for crema and here for vicia.

main / 0 / 1 / 2 / 3 / 4 / 5 / 6 / 7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

part_5.md

part_5.md

genes functional annotation

Files

part_5.md

Latest commit

History

part_5.md

File metadata and controls

genes functional annotation