This pipeline is inspired by F.J. Yang (Bioinplant Lab, Zhejiang University).
- Reference genome file
- WGS fastq files
- SNV core set (missingrate < 0.2, MAF > 0.05, bi-allelic sites)
├── raw_data
├── genome_index
└── logsPlease storage your resequence data in raw_data/ folder and genome file in genome_index/ folder. Script files, pipeline files and configuration files can be stored the way you like.
The config file needs to be at the same folder of snakefile.
# Absolute path to the genome fasta file
ref: "/workingdir/genome_index/genome.fasta" And chromosome IDs to genotype variants by per chromosomes to parallise the process.
chromosomes:
- Chr01_hap1
- Chr02_hap1
- Chr03_hap1
- ...
- Chrnn_hap1You can use the following command to generate the list.
cat /workingdir/genome_index/genome.fasta | grep ">" | awk '{gsub(/^>/, " - "); print}'2.2 Sometimes the fastq files may be ended with .fastq.gz or .fq.gz, specify the suffix of the fastq files if it's necessary.
# Fastq file suffix
fastq_suffix: ".fq.gz" # Default value is ".fq.gz"# Sample list, samples' name should start with letters.
sample:
- "sample1"
- "sample2"
- "sample3"
- "sample4"
- ...
- "samplen"You can use following command to add sample list to the config file if you have a sample list txt file (for example sample.list):
# sample.list
sample1
sample2
sample3
sample4
# Add samples to the config file:
awk '{print " - \"" $0 "\""}' sample.list >> ${working_dir}/SNPcalling_config.yamlPut snakefile and configuration file in the same directory and run.
For example:
snakemake \
--snakefile SNVcalling.smk \
--configfile SNVcalling_config.yaml \
-d ./ \
--use-conda \
--use-singularity \
--nolock \
--rerun-incomplete \
--restart-time 3 \
--executor slurm \
--default-resources \
--jobs 999