Snakemake pipeline for preprocessing of the DRAGEN bam and remapping them to GRCh37 genome for the follow up mosaic variant calling algorithms
The pipeline is originally wirtten by Danny Antaki, and re-implemented by Xin Xu, with help form Martin Breuss and Xiaoxu Yang
java for Linux.
Picard Tools from BROAD institute.
BWA for mapping.
SAMtools for post-alignment processing.
GATK 3.8 for this pipeline.
Sambamba to help with the fast processing.
Unique IDs for any input Dragen bam file.
The ID of the sample, might be shared between several bams.
The path to the input Dragen bam.
Path to the input file list.
Path to the output directory.
Path to the scratch files. Note that the number of temporary files will be euqal to two- or three-times the number of total listed variants.
Path to the temporary files. The temporary files will be deleted after the process is successfully finished.
Path to your java jre.
Path to your picard.jar.
Path to your BWA.
Path to your SAMtools.
Path to your GenomeAnalysisTK.jar
Path to your executable sambamba
Your reference genome.
vmills indel vcf file (corresponding to your reference genome file).
List for common indels from the 1000 Genome Project in vcf format (corresponding to your reference genome file).
Snp list of your dbsnp file in vcf format (corresponding to your reference genome file).
The final output bam is in the recaled_bams folder.