Skip to content

NCycle analysis (draft)

rprops edited this page Jul 26, 2019 · 2 revisions

Context

These scripts can be used to check Nitrogen cycling genes in the translated nucleotide sequence of a MAG. This is based on the database and code from this repo: https://github.com/qichao1984/NCyc

If you use please cite: Qichao Tu, Lu Lin, Lei Cheng, Ye Deng, Zhili He, NCycDB: a curated integrative database for fast and accurate metagenomic profiling of nitrogen cycling genes, Bioinformatics, Volume 35, Issue 6, 15 March 2019, Pages 1040–1048, https://doi.org/10.1093/bioinformatics/bty741

Preprocessing

Generate sample_info.tsv file for each MAG that contains the MAG name and the number protein sequences to blast.

#!/bin/bash
set -e

for MAG_list in `ls -d ./genes_refGen/*`
do
	MAG=${MAG_list##*/}
	echo $MAG
	grep ">" ${MAG_list}/${MAG}.faa | wc -l | awk -v mag_id="${MAG}" '{print mag_id"\t"$1}' > ${MAG_list}/sample_info.tsv
done

Run Ncycle script for each MAG

We don't want resampling for to a minimum number of sequences so we run the script for each MAG separately.

#!/bin/bash
set -e

for sample in `cat samples_refGen.tsv`;
do
	echo $sample
	cp /home/rprops/NCyc/genes_refGen/${sample}/sample_info.tsv /home/rprops/NCyc/
	cp /home/rprops/NCyc/genes_refGen/${sample}/${sample}.faa /home/rprops/NCyc/
	perl /home/rprops/NCyc/NCycProfiler.PL -d .  -m diamond -f faa -s prot -si /home/rprops/NCyc/sample_info.tsv -o NCyc_${sample}.tsv
	rm ${sample}.faa sample_info.tsv ${sample}.diamond
done

Clone this wiki locally