-
Notifications
You must be signed in to change notification settings - Fork 36
Diploid assembly based SV detection using cuteSV
JiangTao edited this page Jan 15, 2022
·
7 revisions
With the well acquisition of haplotype-resolved or phased genome assembly, it has great potential for detection of phased SVs. We wrote a script that post-process the typically cuteSV callsets from assembly alignments to generate the diploid-assembly-based SV calls. Here we show a simple demo about the performance about this new function. A step-by-step operation is listed below.
- It is beneficial to set --max_size as -1 that enables cuteSV to report full-length SVs.
- Setting --max_split_parts as -1 will apply the alignments with all split segments for SV calling.
- Create directory structure
mkdir -p assemblies ref alns calls
- Download assembly datasets
We collect two latest published NA24385 (also called HG002) human genome assembly datasets from Wenger et al and Garg et al respectively. Both of the two assemblies were used 29.7x PacBio CCS data. The former applied Canu for contig assembly and Arrow for polishing. The latter applied DipAsm for contig assembly and a 28.5x Hi-C dataset for scaffolding.
echo "Download maternal haplotype from Wenger."
curl-s https://downloads.pacbcloud.com/public/publications/2019-HG002-CCS/asm/HG002_canu_maternal.fasta \
> assemblies/HG002_canu_maternal.fasta
echo "Download paternal haplotype from Wenger."
curl-s https://downloads.pacbcloud.com/public/publications/2019-HG002-CCS/asm/HG002_canu_paternal.fasta \
> assemblies/HG002_canu_paternal.fasta
echo "Download haplotype H1 from Garg."
curl-s ftp://ftp.dfci.harvard.edu/pub/hli/whdenovo/asm/NA24385-denovo-H1.fa.gz \
> assemblies/NA24385-denovo-H1.fa.gz
echo "Download haplotype H2 from Garg."
curl-s ftp://ftp.dfci.harvard.edu/pub/hli/whdenovo/asm/NA24385-denovo-H2.fa.gz \
> assemblies/NA24385-denovo-H2.fa.gz
- Download hg19 reference with decoys and map non-ACGT characters to N
curl -s ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz
> ref/human_hs37d5.fasta.gz
gunzip ref/human_hs37d5.fasta.gz
sed -i '/^[^>]/ y/BDEFHIJKLMNOPQRSUVWXYZbdefhijklmnopqrsuvwxyz/NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN/' \
ref/human_hs37d5.fasta
- Add haplotype-unique tag
cat <(cat assemblies/HG002_canu_maternal.fasta | sed 's/>/>cutesvh1_/g') \
<(cat assemblies/HG002_canu_paternal.fasta | sed 's/>/>cutesvh2_/g') \
> assemblies/HG002_wenger.fasta
cat <(zcat assemblies/NA24385-denovo-H1.fa.gz | sed 's/>/>cutesvh1_/g') \
<(zcat assemblies/NA24385-denovo-H2.fa.gz | sed 's/>/>cutesvh2_/g') \
> assemblies/HG002_garg.fasta
- Contigs alignment
conda install minimap2 -c bioconda
minimap2 --paf-no-hit -a -x asm5 --cs -r 2k -t 8 ref/human_hs37d5.fasta assemblies/HG002_wenger.fasta \
> alns/HG002_wenger.sam
minimap2 --paf-no-hit -a -x asm5 --cs -r 2k -t 8 ref/human_hs37d5.fasta assemblies/HG002_garg.fasta \
> alns/HG002_garg.sam
- Alignments post-processing
conda install samtools -c bioconda
samtools sort -@ 8 -o alns/HG002_wenger.bam alns/HG002_wenger.sam
samtools index -@ 8 alns/HG002_wenger.bam
samtools sort -@ 8 -o alns/HG002_garg.bam alns/HG002_garg.sam
samtools index -@ 8 alns/HG002_garg.bam
- Install cuteSV
git clone https://github.com/tjiangHIT/cuteSV.git
cd cuteSV && python3 setup install && cd ..
- Diploid-assembly-based SV calling
cuteSV alns/HG002_wenger.bam ref/human_hs37d5.fasta calls/HG002_wenger.vcf ./ \
-s 1 --genotype --report_readid -p -1 -mi 500 -md 500 \
--max_cluster_bias_INS 1000 \
--diff_ratio_merging_INS 0.9 \
--max_cluster_bias_DEL 1000 \
--diff_ratio_merging_DEL 0.5
python3 cuteSV/src/cuteSV/diploid_calling.py calls/HG002_wenger.vcf calls/HG002_wenger_final.vcf
cuteSV alns/HG002_garg.bam ref/human_hs37d5.fasta calls/HG002_garg.vcf ./ \
-s 1 --genotype --report_readid -p -1 -mi 500 -md 500 \
--max_cluster_bias_INS 1000 \
--diff_ratio_merging_INS 0.9 \
--max_cluster_bias_DEL 1000 \
--diff_ratio_merging_DEL 0.5
python3 cuteSV/src/cuteSV/diploid_calling.py calls/HG002_garg.vcf calls/HG002_garg_final.vcf
Assembly | F1 % | Precision % | Recall % | GT-F1 % | GT-Precision % | GT-Recall % |
---|---|---|---|---|---|---|
Wenger et al | 92.18 | 93.32 | 91.08 | 84.09 | 79.17 | 89.65 |
Garg et al | 96.09 | 95.61 | 96.58 | 95.29 | 94.09 | 96.52 |