- Genome graph file in GFA format
- Reference haplotype genome which the graph build on
- WGS fastq files
- SNV set (missingrate < 0.2, MAF > 0.05, bi-allelic sites)
├── raw_data
├── genome_index
└── logsPlease storage your resequence data in raw_data/ folder, graph file and genome file in genome_index/ folder. Script files, pipeline files and configuration files can be stored the way you like.
The config file needs to be at the same folder of snakefile.
# Absolute path to the pangenome gfa file
graph_gfa: "/genome_index/path/to/gfa"
# Reference haplotype code that pangenome graph built on
haplotype_code: ""
# Absolute path to reference haplotype genome file
haplotype_genome: "/genome_index/path/to/fa"2.2 Sometimes the fastq files may be ended with .fastq.gz or .fq.gz, specify the suffix of the fastq files if it's necessary.
# Fastq file suffix
fastq_suffix: ".fq.gz" # Default value is ".fq.gz"In this workflow:
vgandGLnexusare installed via mamba.DeepVariantis executed through a Singularity container.
Please make sure to:
- Update the conda environment paths for
vgandGLnexus. - Set the absolute path to the Singularity image for DeepVariant.
If you use different installation methods, modify the Snakemake file accordingly to match your environment.
# Sample list, samples' name should start with letters.
sample:
- "sample1"
- "sample2"
- "sample3"
- "sample4"
- ...
- "samplen"You can use following command to add sample list to the config file if you have a sample list txt file (for example sample.list):
# sample.list
sample1
sample2
sample3
sample4
# Add samples to the config file:
awk '{print " - \"" $0 "\""}' sample.list >> ${working_dir}/SNPcalling_config.yamlPut snakefile and configuration file in the same directory, then running it.
For example:
snakemake \
--snakefile ${snakefile} \
--configfile ${configfile} \
-d ${working_dir} \
--cores ${cores_num} \
--use-conda \
--use-singularity \
--rerun-incomplete \
--nolock