Skip to content

biociao/metaSeq

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

METAgenomic Beads Barcoding Quantification (METABBQ) pipeline.

This is a data processing pipeline to achieve bacterial/fungal long amplicons from complex environmental samples. Two experiments were mainly implemented: sing-tube Long Fragment Reads (stLFR) and Rolling Circle Replication (RCR).

Installation

Prerequisites

  • python >= 3.6
  • perl >= 5
  • fastp - A modified version which implemented a module to split the stlfr barcodes.
  • Mash - A modified version to fit stLFR data
  • Snakemake - a pythonic workflow system.
  • blast - The classic alignment tool finding regions of similarity between biological sequences.
  • Assemble methods
    • SPAdes - SPAdes Genome Assembler
    • MEGAHIT - An ultra-fast and memory-efficient NGS assembler

I recommend to install above tools in an virtual env via conda:

  1. create and install part of them:
conda create -n metaseq -c bioconda -c conda-forge snakemake pigz megahit blast
source activate metaseq
  1. According to the corresponding documents, install fastp, SPAdes and community, etc. under env metaseq

Make sure above commands (executables) can be found in the PATH.

install
Clone the launcher to initiate the work dir as well as to call sub-functions.

cd /path/to/your/dir
git clone https://github.com/ZeweiSong/metaSeq.git
export PATH="/path/to/your/dir/metaSeq":$PATH

I haven't yet write any testing module to check abve prerequesites. At present you may need to test it yourself.

Usage

Prepare configs

cd instance
metabbq cfg  

This command will create a default.cfg in your current dir. You should modifed it to let the launcher know the required files and parameters

Initiating a project Prepare an input.list file to describe the sample name and input sequence file path.

metabbq -i input.list -c default.cfg -V

By default, the metabbq will create a directory with the name of {sample} and a sub-directory named input under it.

Run Quality-Contorl module

metabbq smk -j -np {sample}/clean/BB.stat
# -j make the jobs execuated paralled under suitable cores/threads
# -n mean dry-run with a preview of "what needs to be run". Remove it to really run the pipeline.

Run pre-binning assembly module

You need to select a assemble tool in the configure file and the corresponding output file name in following:

metabbq smk -j -np {sample}/summary.BI.megahit.contig.fasta
metabbq smk -j -np {sample}/summary.BI.idba.contig.fasta
metabbq smk -j -np {sample}/summary.BI.spades.contig.fasta

Troubleshooting

Feedback are welcome to submit in the issue page.

About

Tools for metagenomic studies.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Perl 61.3%
  • Python 31.9%
  • Shell 6.8%