Skip to content

Latest commit

 

History

History
384 lines (217 loc) · 11 KB

part_5.md

File metadata and controls

384 lines (217 loc) · 11 KB

genes functional annotation

environiment: yaml.enrichment

aim: gain functional insight on the genes associated to crema - vicia interactions.


GO annotation has been carried out using

  • PANNZER using no filtering and defeault parameters

  • EggNOG-mapper using Insecta and Viridiplantae as Taxonomic scope and leveraging all annotations (including electronic ones).

PANNZER assigned 41331 and 99382 GO-terms respectively to crema and vicia (to 7930 and 18695 transcripts).

EggNOG-mapper assigned 391736 and 690388 GO-terms respectively to crema and vicia (to 5061 and 11897 transcripts).

Subsequent enrichment analysis legerage topGO and for this purpose the PANNZER and EggNOG-mapper outputs need to be reformatted using:

sh scripts/reformat_GOs.sh 
enrichment/PANNZER_crema/GO_crema_predictions.lst 
enrichment/eggNOG_crema/crema_eggNOG-mapper_annotations.tsv 
enrichment/GO_crema/GO_crema_geneUniverse

and

sh scripts/reformat_GOs.sh 
enrichment/PANNZER_vicia/GO_vicia_predictions.lst 
enrichment/eggNOG_vicia/vicia_eggNOG-mapper_annotations.tsv 
enrichment/GO_vicia/GO_vicia_geneUniverse

This script take as input:

  • PANNZER output

  • eggNOGmapper output

  • the output file name

In total we annotated with GOterms 8615 and 18698 transcripts for crema and vicia.


Since crema expression data encompasses two "tissues" the correct approach to generate an enrichment is to use as background only the genes which are found to be expressed in that particular tissue.

Thus for crema we need beforehand to create a tissue-specific gene universes, respectively for AD and CT:

for i in $(awk '{print $1}' abundances/crema/crema_deseq2_gene/RSEM_crema.filtered.gene.counts.matrix.A_AD_vs_*_AD.DESeq2.DE_results | 
sort -u | grep TRINITY); do grep $i enrichment/GO_crema/GO_crema_geneUniverse; done > enrichment/GO_crema/GO_crema_geneUniverse_AD
for i in $(awk '{print $1}' abundances/crema/crema_deseq2_gene/RSEM_crema.filtered.gene.counts.matrix.A_CT_vs_*_CT.DESeq2.DE_results | 
sort -u | grep TRINITY); do grep $i enrichment/GO_crema/GO_crema_geneUniverse; done > enrichment/GO_crema/GO_crema_geneUniverse_CT 

Then we build two folders to split the DE results:

cd red/paint/abundances/crema/crema_deseq2_gene
mkdir AD CT
cp crema_A_AD_vs* AD/
cp crema_A_CT_vs* CT/

Then the GO enrichment analysisis is performed using the following commands:

Rscript scripts/GO_enrichment.Rscript enrichment/GO_vicia/GO_vicia_geneUniverse 
abundances/vicia/vicia_deseq_gene/ BP 5 classic fisher 0.01 enrichment/GO_vicia/
Rscript scripts/GO_enrichment.Rscript enrichment/GO_crema/GO_crema_geneUniverse_CT 
abundances/crema/crema_deseq2_gene/CT/ BP 5 classic fisher 0.01 enrichment/GO_crema/
Rscript scripts/GO_enrichment.Rscript enrichment/GO_crema/GO_crema_geneUniverse_AD 
abundances/crema/crema_deseq2_gene/AD/ BP 5 classic fisher 0.01 enrichment/GO_crema/

The Rscript has several positional arguments:

  1. the gene universe
  2. the input folder containing the genes subsets considered
  3. the onyology considered (BP/MF/CC)
  4. the minimum node size
  5. the algorythm
  6. the statistical test
  7. the p value cutoff

Raw outputs are available in:

enrichment/GO_vicia/*tsv here

enrichment/GO_crema/*tsv here


EggNOG annotations for DE genes were extracted using:

for i in $(ls abundances/crema/crema_deseq2_gene/crema_*.lst); do 
basename=$(echo $i | awk -F "/" '{print $NF}' | awk -F "\." '{print $1}'); 
sh scripts/extract_eggNOG_annotations.sh $i  enrichment/eggNOG_crema/crema_eggNOG-mapper_annotations.tsv 
> enrichment/eggNOG_crema/"$basename"_eggNOG_annotations.tsv; done
for i in $(ls abundances/vicia/vicia_deseq2_gene/vicia_*.lst); do 
basename=$(echo $i | awk -F "/" '{print $NF}' | awk -F "\." '{print $1}'); 
sh scripts/extract_eggNOG_annotations.sh $i  enrichment/eggNOG_vicia/vicia_eggNOG-mapper_annotations.tsv 
> enrichment/eggNOG_vicia/"$basename"_eggNOG_annotations.tsv; done

Raw outputs are available in:

enrichment/eggNOG_vicia/*tsv here

enrichment/eggNOG_crema/*tsv here


GO enrichment analyses were plotted using the Rscript plot_GO_enrichment.Rscript.

NB: this part has been carried out from any environiment due to conflicts.

The Rscript positional arguments:

  1. enriched goterms alog with their p value / red_enrichment.tsv files generated by GO_enrichment.Rscript
  2. similarity calculation method

For vicia:

Rscript scripts/plot_GO_enrichment.Rscript enrichment/GO_vicia/vicia_UP_BP_enrichment.tsv "orange" 0.01
Rscript scripts/plot_GO_enrichment.Rscript enrichment/GO_vicia/vicia_DN_BP_enrichment.tsv "lightblue" 0.01

For crema:

for i in enrichment/GO_crema/*_UP_BP_enrichment.tsv; 
	do Rscript plot_GO_enrichment.R $i "orange" 0.05; 
	done

for i in enrichment/GO_crema/*_DN_BP_enrichment.tsv; 
	do Rscript plot_GO_enrichment.R $i "lightblue" 0.05; 
	done

The full list can be found here - WIP.


Then the genes associated to GOterms of interest were extracted; here are some examples for:

  • vicia downregulated genes associated to GO:0042136 - neurotransmitter biosynthetic process:
Rscript scripts/GO_extract.Rscript enrichment/GO_vicia/enrichment.Rdata 
GO:0042136 abundances/vicia/vicia_deseq_gene/vicia_DN_genes.lst
[1] "TRINITY_DN36133_c0_g1" "TRINITY_DN833_c0_g1" 

Transcripts sequences were extracted as amminoacids sequences using:

for i in $(cat crema/abundances/crema/crema_deseq2_gene/crema_A_CT_vs_B_CT_UP_genes.lst); do 
sed -n -e "/$i/,/TRINITY/ p" annotations/crema/crema.Trinity.fasta.transdecoder.pep | head -n -1 | awk '{print $1}';
done

Additional KO annotation has been carried out using BlastKOALA leveraging the correct Taxonomy ID and the family_eukaryotes database. BlastKOALA assigned XXX and YYY KO-terms respectively to crema and vicia.

To further confirm the functional meaning of the observed transcriptional responses, we also leveraged th KO annotation generated by BlastKOALA.

grep -f abundances/vicia/vicia_deseq_gene/vicia_UP_genes.lst enrichment/blastKOALA_vicia/vicia_family-eukaryotes_blastKOALA.lst | grep K > enrichment/KO_vicia/vicia_UP_genes
grep -f abundances/vicia/vicia_deseq_gene/vicia_DN_genes.lst enrichment/blastKOALA_vicia/vicia_family-eukaryotes_blastKOALA.lst | grep K > enrichment/KO_vicia/vicia_DN_genes

and

for i in abundances/crema/crema_deseq2_gene/*A_*_vs_*lst; 
	do 
	name=$(echo ${i::-10} | awk -F "/" '{print $NF}'); 
	grep -f $i enrichment/blastKOALA_crema/crema_family-eukaryotes_blastKOALA.lst | grep K > enrichment/KO_crema/$name; 
	done

Raw outputs are available in:

enrichment/KO_vicia/*tsv here

enrichment/KO_crema/*tsv here


For crema a signalp analysis was carried out and then the number of protein with a secretion peptide was assigned to each module with:

for i in abundances/crema/crema_deseq2_gene/*A_*_vs_*lst; 
	do 
	secreted=$(grep -f $i enrichment/crema_signlp/crema_transdecoder_signalp.out | grep -c -w SP); 
	total=$(wc -l $i | awk '{print $1}'); 
	genes=$(grep -f $i enrichment/crema_signlp/crema_transdecoder_signalp.out | grep -w SP | awk -F '.' '{print $1}' | tr '\n\t\r ' '\t'); 
	echo -e "$i\t$secreted\t$total\t$genes"; 
	done > single_genes_fun/crema_signalp.lst

The associated otput can be found in the folder single_genes_fun here.


Furthermore, for crema we manually characterised:

  • venom-associate proteins
  • antimicrobial peptides
  • aggression-related genes
  • genes involved in dopamine methabolism

Databases were respectively:

  • one based on VenomZone (insecta)
  • one using "taxonomy:"Insecta [50557]" keyword:"Antimicrobial [KW-0929]" AND reviewed:yes" on Uniprot
  • one manually curated based on the relevant litterature
  • one manually curated for genes involved in dopamine methabolism

Only the workflow for the antimicrobial peptides will be shown, as others are identical. After building the blastp db using:

makeblastdb -in antimicrobial_peptides.fasta -dbtype prot

blastp searches were carried out, using:

blastp -query annotations/crema/crema.Trinity.fasta.transdecoder.pep 
-db enrichment/crema_antimb/antimicrobial_peptides.fasta 
-evalue 1e-15 -qcov_hsp_perc 80 -outfmt "6 qseqid sseqid evalue qcovs" 
-max_target_seqs 1 -max_hsps 1 > enrichment/crema_antimb/crema_antimb_blastp.out

Subsequently, the DE statistics for

(a) the comparison between "control" and "treated" conditions and

(b) body parts

can be obtained respectively with:

sh scripts/extract_exp_val_crema_tissue_contrast.sh  enrichment/crema_antimb/crema_antimb_blastp.out 
> enrichment/crema_antimb/crema_antimb_expression_tissue_contrast.tsv
sh scripts/extract_exp_val_crema_treatm_contrast.sh  enrichment/crema_antimb/crema_antimb_blastp.out 
> enrichment/crema_antimb/crema_antimb_expression_treatm_contrast.tsv

The associated otputs can be found:

  • here for venom-associated proteins
  • here for antimicrobial peptides
  • here for aggression-related genes
  • here for genes involved in dopamine methabolism

Last, but not least, manually curated description of genes can be found here for crema and here for vicia.


main / 0 / 1 / 2 / 3 / 4 / 5 / 6 / 7