Skip to content

Multi-allelic and complex variants lose zygosity info when atomized  #2239

@matbreno

Description

@matbreno

Hello,
using version 1.20, I have a bunch of variants that might get the wrong genotype when normalized. You can finde the test file attached
test.vcf.gz

For example (INFO e FORMAT removed for readibility):

chr14	105174100	.	GCCCGC	CGCCCCGC,GCCCG	18.6776	PASS	.	.	0/1

Becomes

bcftools norm -a --atom-overlaps . test.vcf.gz | bcftools norm -f hg19.fasta
chr14	105174100	.	G	C	18.6776	PASS	.	.	0/.
chr14	105174100	.	G	GGC	18.6776	PASS	.	.	0/.
chr14	105174104	.	GC	G	18.6776	PASS	.	.	0/0

While a I would expect something like

chr14	105174100	.	G	C	18.6776	PASS	.	.	0/.
chr14	105174100	.	G	GGC	18.6776	PASS	.	.	0/1
chr14	105174104	.	GC	G	18.6776	PASS	.	.	0/0

The second record:

chr22	36744886	.	GCCCC	GGCT	578.003	PASS	.	.	0/1

Becomes

bcftools norm -a --atom-overlaps . test.vcf.gz | bcftools norm -f hg19.fasta
chr22	36744886	.	GC	G	578.003	PASS	.	.	0/.
chr22	36744887	.	C	G	578.003	PASS	.	.	0/1
chr22	36744889	.	C	T	578.003	PASS	.	.	0/.

while I would expect something like

chr22	36744886	.	GC	G	578.003	PASS	.	.	0/1
chr22	36744888	.	C	G	578.003	PASS	.	.	0/1
chr22	36744890	.	C	T	578.003	PASS	.	.	0/1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions