This pipeline calls somatic variants from tumor-normal paired whole-exome sequencing data using Mutect2 in scatter-gather mode for parallel processing. It identifies SNVs and indels, applies filtering to remove artifacts and germline variants, and calculates tumor mutational burden (TMB) and microsatellite instability (MSI) status as clinical biomarkers.
git clone github.com/schimar/exoVar.git
Make sure you have all of the necessary resource files copied into the resources/ folder
NOTE: omitted here. These depend on your reference genome and target panel. Contact me to obtain a list of other necessary resource files.
- define the paths
- go to working directory
- activate mamba environment (if not already done)
- perform dry-run
- run workflow
NOTE: You need to run the bcl2fq rule separately, before running the rest of the workflow as the fastq files need to be in the respective folder for the units/tsv to be created. For this, you need to comment out everything after the bcl-convert file in rule all (in the main snakefile, everything after line 47). Once bcl-convert is done, uncomment the same ones again and you're golden for the second run!
(i.e. where is your raw data (the *.bcl files) and where do you want to write the data to?) make sure you have the following info:
- runid (where to write to - consider writing to local ssd)
- bcldir (location of Sequencing run folder - the final output will be copied into this folder/analysis/)
- SampleSheet.csv in bcldir
NOTE: that if you are running on a panel different than <omitted.bed>, you currently need to specify two more parameters, namely and <analysis_path>. specifies the target regions (e.g. resources/<omitted.bed> for the chosen panel) and <analysis_path> specifies the name of the output folder, which will be written in .
cd somaVar/workflow/
NOTE: this is the base environment of snakemake v8.0.0. With micromamba (or any other flavor of mamba) installed, you need to install this with
mamba create -c conda-forge -c bioconda -n snakemake snakemake=8.0
The remainder of necessary software packages will be installed in their respective isolated environments (see folder envs/.)
mamba activate snakemake
smk -np --use-conda --conda-prefix /opt/envs/ --conda-frontend mamba --config runid=<output_path> bcldir=/<seqRun_path>/
smk -j<nthreads> --use-conda --conda-prefix /opt/envs/ --conda-frontend mamba --config runid=<output_path/> bcldir=<seqRun_path/>
smk -np --use-conda --conda-prefix /opt/envs/ --conda-frontend mamba --config runid=<PATH/TO/runid> bcldir=<PATH/TO/bcldir/> --rulegraph --quiet --forceall | dot -Tpng > rg.png
smk -np --use-conda --conda-prefix /opt/envs/ --conda-frontend mamba --config runid=<PATH/TO/runid> bcldir=<PATH/TO/bcldir/> --dag --quiet --forceall | dot -Tpng > dag.png