exoVar - whole exome somatic variant calling - snakemake workflow

-----------------------------------------------------------------------------

This pipeline calls somatic variants from tumor-normal paired whole-exome sequencing data using Mutect2 in scatter-gather mode for parallel processing. It identifies SNVs and indels, applies filtering to remove artifacts and germline variants, and calculates tumor mutational burden (TMB) and microsatellite instability (MSI) status as clinical biomarkers.

-----------------------------------------------------------------

Prerequisites before running the workflow

----------------------

clone the workflow:

git clone github.com/schimar/exoVar.git

----------------------

Resources

Make sure you have all of the necessary resource files copied into the resources/ folder

NOTE: omitted here. These depend on your reference genome and target panel. Contact me to obtain a list of other necessary resource files.

-----------------------------------------------------------------------------

Running the workflow:

define the paths
go to working directory
activate mamba environment (if not already done)
perform dry-run
run workflow

-----------------------------------------------------------------------------

NOTE: You need to run the bcl2fq rule separately, before running the rest of the workflow as the fastq files need to be in the respective folder for the units/tsv to be created. For this, you need to comment out everything after the bcl-convert file in rule all (in the main snakefile, everything after line 47). Once bcl-convert is done, uncomment the same ones again and you're golden for the second run!

1) defne the paths:

(i.e. where is your raw data (the *.bcl files) and where do you want to write the data to?) make sure you have the following info:

- runid (where to write to - consider writing to local ssd) 
- bcldir (location of Sequencing run folder - the final output will be copied into this folder/analysis/)
- SampleSheet.csv in bcldir

NOTE: that if you are running on a panel different than <omitted.bed>, you currently need to specify two more parameters, namely and <analysis_path>. specifies the target regions (e.g. resources/<omitted.bed> for the chosen panel) and <analysis_path> specifies the name of the output folder, which will be written in .

2) go to working directory

cd somaVar/workflow/

3) activate mamba environment

NOTE: this is the base environment of snakemake v8.0.0. With micromamba (or any other flavor of mamba) installed, you need to install this with

mamba create -c conda-forge -c bioconda -n snakemake snakemake=8.0

The remainder of necessary software packages will be installed in their respective isolated environments (see folder envs/.)

mamba activate snakemake

4) perform dry-run

smk -np --use-conda --conda-prefix /opt/envs/ --conda-frontend mamba --config runid=<output_path> bcldir=/<seqRun_path>/

5) run workflow

smk -j<nthreads> --use-conda --conda-prefix /opt/envs/ --conda-frontend mamba --config runid=<output_path/> bcldir=<seqRun_path/>

-----------------------------------------------------------------------------

other useful features

get the rulegraph directed acyclic graph (DAG) for multiple samples as dag_png:

smk -np --use-conda --conda-prefix /opt/envs/ --conda-frontend mamba --config runid=<PATH/TO/runid> bcldir=<PATH/TO/bcldir/> --rulegraph --quiet --forceall | dot -Tpng > rg.png

get the directed acyclic graph (DAG) for multiple samples as dag_png:

smk -np --use-conda --conda-prefix /opt/envs/ --conda-frontend mamba --config runid=<PATH/TO/runid> bcldir=<PATH/TO/bcldir/> --dag --quiet --forceall | dot -Tpng > dag.png

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
config		config
workflow		workflow
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

exoVar - whole exome somatic variant calling - snakemake workflow

-----------------------------------------------------------------------------

-----------------------------------------------------------------------------

-----------------------------------------------------------------

Prerequisites before running the workflow

----------------------

clone the workflow:

----------------------

Resources

-----------------------------------------------------------------------------

-----------------------------------------------------------------------------

Running the workflow:

-----------------------------------------------------------------------------

1) defne the paths:

2) go to working directory

3) activate mamba environment

4) perform dry-run

5) run workflow

-----------------------------------------------------------------------------

-----------------------------------------------------------------------------

other useful features

get the rulegraph directed acyclic graph (DAG) for multiple samples as dag_png:

get the directed acyclic graph (DAG) for multiple samples as dag_png:

if you had to cancel a run or if you encountered an error, append `--rerun-incomplete` to the respective smk command

About

Uh oh!

Releases

Packages

Languages

schimar/exoVar

Folders and files

Latest commit

History

Repository files navigation

exoVar - whole exome somatic variant calling - snakemake workflow

-----------------------------------------------------------------------------

-----------------------------------------------------------------------------

-----------------------------------------------------------------

Prerequisites before running the workflow

----------------------

clone the workflow:

----------------------

Resources

-----------------------------------------------------------------------------

-----------------------------------------------------------------------------

Running the workflow:

-----------------------------------------------------------------------------

1) defne the paths:

2) go to working directory

3) activate mamba environment

4) perform dry-run

5) run workflow

-----------------------------------------------------------------------------

-----------------------------------------------------------------------------

other useful features

get the rulegraph directed acyclic graph (DAG) for multiple samples as dag_png:

get the directed acyclic graph (DAG) for multiple samples as dag_png:

if you had to cancel a run or if you encountered an error, append --rerun-incomplete to the respective smk command

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

if you had to cancel a run or if you encountered an error, append `--rerun-incomplete` to the respective smk command

Packages