This is a data processing pipeline to achieve bacterial/fungal long amplicons from complex environmental samples. Two experiments were mainly implemented: sing-tube Long Fragment Reads (stLFR) and Rolling Circle Replication (RCR).
Prerequisites
- python >= 3.6
- perl >= 5
- fastp - A modified version which implemented a module to split the stlfr barcodes.
- Mash - A modified version to fit stLFR data
- Snakemake - a pythonic workflow system.
- blast - The classic alignment tool finding regions of similarity between biological sequences.
- Assemble methods
I recommend to install above tools in an virtual env via conda:
- create and install part of them:
conda create -n metaseq -c bioconda -c conda-forge snakemake pigz megahit blast
source activate metaseq
- According to the corresponding documents, install
fastp
,SPAdes
andcommunity
, etc. under envmetaseq
Make sure above commands (executables) can be found in the PATH
.
install
Clone the launcher to initiate the work dir as well as to call sub-functions.
cd /path/to/your/dir
git clone https://github.com/ZeweiSong/metaSeq.git
export PATH="/path/to/your/dir/metaSeq":$PATH
I haven't yet write any testing module to check abve prerequesites. At present you may need to test it yourself.
Prepare configs
cd instance
metabbq cfg
This command will create a default.cfg
in your current dir.
You should modifed it to let the launcher know the required files and parameters
Initiating a project
Prepare an input.list
file to describe the sample name and input sequence file path.
metabbq -i input.list -c default.cfg -V
By default, the metabbq
will create a directory with the name of {sample} and a sub-directory named input
under it.
metabbq smk -j -np {sample}/clean/BB.stat
# -j make the jobs execuated paralled under suitable cores/threads
# -n mean dry-run with a preview of "what needs to be run". Remove it to really run the pipeline.
You need to select a assemble tool in the configure file and the corresponding output file name in following:
metabbq smk -j -np {sample}/summary.BI.megahit.contig.fasta
metabbq smk -j -np {sample}/summary.BI.idba.contig.fasta
metabbq smk -j -np {sample}/summary.BI.spades.contig.fasta
Feedback are welcome to submit in the issue page.