Snakemake workflow for yield-normalized comparative genome-centric metagenomics
- The workflow is designed to take in metagenomic sequencing datasets, subsample the reads to specified depths and perform microbial genome recovery with mmlong2
- Afterwards, additional analysis is performed to investigate possible reasons behind differences in genome recovery efficiency, which includes micro-diversity analysis, assessment of non-prokaryotic DNA, and community composition analysis
- Multiple bioinformatics tools are used in the analysis to correct for possible tool-related biases
- Download the repo and update the
config/config.yaml
file to point to the correct Conda environments, read datasets, and databases - The workflow can use a local Conda executable and environments, bypassing some Snakemake compatibility issues
- Update the
mmcomp.sh
script to be compatible with the server and job scheduler set up, as desired - It is recommended to run the workflow with multiple retries turned on, as each retry will be submitted with increased resource allocation