Skip to content

bcftools norm swapped phased haplotypes #1893

@BinglanLi

Description

@BinglanLi

Hi,

I am wondering if this is a bug. Bcftools norm (v1.17 but also other versions) changed the phased haplotypes for some positions when reconstructing multiallelic loci.

Here is an example of the input

#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	Sample
chr19	40991369	rs8192709	C	T	.	.	.	GT	1|0
chr19	41016810	rs3211371	C	A	.	.	.	GT	0|0
chr19	41016810	rs3211371	C	T	.	.	.	GT	0|1

I ran the following command

# on grch38
bcftools norm -m+ -c ws -f reference.fna.bgz input.vcf
# or the following just to collapse the multiallelic locus
bcftools norm -m+ -N input.vcf

I expect the following output

#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	Sample
chr19	40991369	rs8192709	C	T	.	.	.	GT	1|0
chr19	41016810	rs3211371	C	A,T	.	.	.	GT	0|2

But the actual output looked like the following

#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	Sample
chr19	40991369	rs8192709	C	T	.	.	.	GT	1|0
chr19	41016810	rs3211371	C	A,T	.	.	.	GT	2|0

Why is the phased haplotype swapped after reconstructing multiallelic loci? This happened too when I split the multiallelic loci to uniallelic representations and then reconstructed the multiallelic loci back.

Looking forward to hearing from you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions