-
Notifications
You must be signed in to change notification settings - Fork 260
Description
I'm using bcftools 1.20 to split multiallelic variants in one .vcf, merge it with a second .vcf, and then recover the multiallelic variants. For some variants, reference alleles are being counted as missing in the final file. Here are the ACs for one position along the way:
file_A.vcf.gz
chr21:1000000:G:T,GT
allele: G GT T missing
AC: 20431 5617 2 0
file_B.vcf.gz
chr21:1000000:G:GT
allele: G GT missing
AC: 1775 387 0
bcftools norm -a --atom-overlaps . --check-ref s -f reference_fasta.fa -m -both --multi-overlaps 0 -o file_A2.vcf.gz -O z file_A.vcf.gz
file_A2.vcf.gz
chr21:1000000:G:GT
allele: G GT missing
AC: 20433 5617 0
chr21:1000000:G:T
allele: G T missing
AC: 26048 2 0
bcftools norm --check-ref s -f reference_fasta.fa -o file_B2.vcf.gz -O z file_B.vcf.gz
file_B2.vcf.gz
chr21:1000000:G:GT
allele: G GT missing
AC: 1775 387 0
(I create a file named 'file_list.txt' with the names of file_A2.vcf.gz and file_B2.vcf.gz)
bcftools merge -m none -O z -o file_C.vcf.gz -l file_list.txt
file_C.vcf.gz:
chr21:1000000:G:GT
allele: G T missing
AC: 26048 2 2162
chr21:1000000:G:T
allele: G GT missing
AC: 22208 6004 0
bcftools norm -m +any -o file_C2.vcf.gz -O z file_C.vcf.gz
file_C2.vcf.gz
chr21:1000000:G:T,GT
allele: G GT T missing
AC: 20431 2 6004 1775
I'm seeing this same pattern (the ref alleles from file_B appear as missing in file_C2) for a number of variants. Is there a way that I can get bcftools to keep them as actual ref alleles? It's likely that I just need to use the correct options, but I've tried many combinations without success.