SNV calling based on genome graph

Dependent softwares

What to input

Genome graph file in GFA format
Reference haplotype genome which the graph build on
WGS fastq files

What to output

SNV set (missingrate < 0.2, MAF > 0.05, bi-allelic sites)

Usage

1. Prepare your working directory

├── raw_data
├── genome_index
└── logs

Please storage your resequence data in raw_data/ folder, graph file and genome file in genome_index/ folder. Script files, pipeline files and configuration files can be stored the way you like.

2. Prepare the config file

The config file needs to be at the same folder of snakefile.

2.1 Move the graph file and genome file to `genome_index/` folder, add the file absolute path like:

# Absolute path to the pangenome gfa file
graph_gfa: "/genome_index/path/to/gfa"

# Reference haplotype code that pangenome graph built on
haplotype_code: ""

# Absolute path to reference haplotype genome file
haplotype_genome: "/genome_index/path/to/fa"

2.2 Sometimes the fastq files may be ended with `.fastq.gz` or `.fq.gz`, specify the suffix of the fastq files if it's necessary.

# Fastq file suffix
fastq_suffix: ".fq.gz" # Default value is ".fq.gz"

2.3 Software configuration

In this workflow:

vg and GLnexus are installed via mamba.
DeepVariant is executed through a Singularity container.

Please make sure to:

Update the conda environment paths for vg and GLnexus.
Set the absolute path to the Singularity image for DeepVariant.

If you use different installation methods, modify the Snakemake file accordingly to match your environment.

2.4 Fill in the name of the samples. The samples name need to be filled with specific format like:

# Sample list, samples' name should start with letters.
sample:
    - "sample1"
    - "sample2"
    - "sample3"
    - "sample4"
    - ...
    - "samplen"

You can use following command to add sample list to the config file if you have a sample list txt file (for example sample.list):

# sample.list
sample1
sample2
sample3
sample4

# Add samples to the config file:
awk '{print "    - \"" $0 "\""}' sample.list >> ${working_dir}/SNPcalling_config.yaml

3. Submit the pipeline to HPC cluster

Put snakefile and configuration file in the same directory, then running it.

For example:

snakemake \
	--snakefile ${snakefile} \
    --configfile ${configfile} \
	-d ${working_dir} \
	--cores ${cores_num} \
	--use-conda \
	--use-singularity \
	--rerun-incomplete \
	--nolock

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md
SNVcalling_PG.smk		SNVcalling_PG.smk
SNVcalling_PG_config.yaml		SNVcalling_PG_config.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SNV calling based on genome graph

Dependent softwares

What to input

What to output

Usage

1. Prepare your working directory

2. Prepare the config file

2.1 Move the graph file and genome file to `genome_index/` folder, add the file absolute path like:

2.2 Sometimes the fastq files may be ended with `.fastq.gz` or `.fq.gz`, specify the suffix of the fastq files if it's necessary.

2.3 Software configuration

2.4 Fill in the name of the samples. The samples name need to be filled with specific format like:

3. Submit the pipeline to HPC cluster

About

Uh oh!

Releases

Packages

Languages

License

yaoxkkkkk/SNV-Graph-calling

Folders and files

Latest commit

History

Repository files navigation

SNV calling based on genome graph

Dependent softwares

What to input

What to output

Usage

1. Prepare your working directory

2. Prepare the config file

2.1 Move the graph file and genome file to genome_index/ folder, add the file absolute path like:

2.2 Sometimes the fastq files may be ended with .fastq.gz or .fq.gz, specify the suffix of the fastq files if it's necessary.

2.3 Software configuration

2.4 Fill in the name of the samples. The samples name need to be filled with specific format like:

3. Submit the pipeline to HPC cluster

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

2.1 Move the graph file and genome file to `genome_index/` folder, add the file absolute path like:

2.2 Sometimes the fastq files may be ended with `.fastq.gz` or `.fq.gz`, specify the suffix of the fastq files if it's necessary.

Packages