-
Notifications
You must be signed in to change notification settings - Fork 4
NCycle analysis (draft)
rprops edited this page Jul 26, 2019
·
2 revisions
These scripts can be used to check Nitrogen cycling genes in the translated nucleotide sequence of a MAG. This is based on the database and code from this repo: https://github.com/qichao1984/NCyc
If you use please cite: Qichao Tu, Lu Lin, Lei Cheng, Ye Deng, Zhili He, NCycDB: a curated integrative database for fast and accurate metagenomic profiling of nitrogen cycling genes, Bioinformatics, Volume 35, Issue 6, 15 March 2019, Pages 1040–1048, https://doi.org/10.1093/bioinformatics/bty741
Generate sample_info.tsv file for each MAG that contains the MAG name and the number protein sequences to blast.
#!/bin/bash
set -e
for MAG_list in `ls -d ./genes_refGen/*`
do
MAG=${MAG_list##*/}
echo $MAG
grep ">" ${MAG_list}/${MAG}.faa | wc -l | awk -v mag_id="${MAG}" '{print mag_id"\t"$1}' > ${MAG_list}/sample_info.tsv
done
We don't want resampling for to a minimum number of sequences so we run the script for each MAG separately.
#!/bin/bash
set -e
for sample in `cat samples_refGen.tsv`;
do
echo $sample
cp /home/rprops/NCyc/genes_refGen/${sample}/sample_info.tsv /home/rprops/NCyc/
cp /home/rprops/NCyc/genes_refGen/${sample}/${sample}.faa /home/rprops/NCyc/
perl /home/rprops/NCyc/NCycProfiler.PL -d . -m diamond -f faa -s prot -si /home/rprops/NCyc/sample_info.tsv -o NCyc_${sample}.tsv
rm ${sample}.faa sample_info.tsv ${sample}.diamond
done