-
Notifications
You must be signed in to change notification settings - Fork 260
Closed
Description
Hi,
I am wondering if this is a bug. Bcftools norm (v1.17 but also other versions) changed the phased haplotypes for some positions when reconstructing multiallelic loci.
Here is an example of the input
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample
chr19 40991369 rs8192709 C T . . . GT 1|0
chr19 41016810 rs3211371 C A . . . GT 0|0
chr19 41016810 rs3211371 C T . . . GT 0|1
I ran the following command
# on grch38
bcftools norm -m+ -c ws -f reference.fna.bgz input.vcf
# or the following just to collapse the multiallelic locus
bcftools norm -m+ -N input.vcf
I expect the following output
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample
chr19 40991369 rs8192709 C T . . . GT 1|0
chr19 41016810 rs3211371 C A,T . . . GT 0|2
But the actual output looked like the following
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample
chr19 40991369 rs8192709 C T . . . GT 1|0
chr19 41016810 rs3211371 C A,T . . . GT 2|0
Why is the phased haplotype swapped after reconstructing multiallelic loci? This happened too when I split the multiallelic loci to uniallelic representations and then reconstructed the multiallelic loci back.
Looking forward to hearing from you.
Metadata
Metadata
Assignees
Labels
No labels