bcftools norm with --multi-overlaps . outputs different variants depending on allele order in the input variant

Using
```
bcftools norm -m - --multi-overlaps .  test.vcf
```
on the following input
```
##fileformat=VCFv4.2
##reference=ref.fasta
##contig=<ID=1,length=51304566>
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	A	B
1	511	.	C	G,T	.	.	.	GT	1|0	0|1
1	511	.	C	T,G	.	.	.	GT	2|0	0|2
```
produces the following variants:
```
1	511	.	C	G	.	.	.	GT	1|0	0|1
1	511	.	C	T	.	.	.	GT	.|.	.|.
1	511	.	C	T	.	.	.	GT	.|0	0|.
1	511	.	C	G	.	.	.	GT	1|.	.|1
```
I would expect the output variants for both input variants to look the same, as the only difference in the input variants is the order of the alt alleles. `bcftools norm --atomize --atom-overlaps` outputs the same variants regardless of the order of alleles in the input variant:
```
1	511	.	C	G	.	.	.	GT	1|0	0|1
1	511	.	C	T	.	.	.	GT	.|0	0|.
1	511	.	C	G	.	.	.	GT	1|0	0|1
1	511	.	C	T	.	.	.	GT	.|0	0|.
```

I tracked the difference down to this line of code:
https://github.com/samtools/bcftools/blob/develop/vcfnorm.c#L875
which only keeps refs as refs if this is the first (split) variant in the output, otherwise it sets it to unknown.
Is there a specific reason for treating refs in the first output variant differently? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bcftools norm with --multi-overlaps . outputs different variants depending on allele order in the input variant #2160

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bcftools norm with --multi-overlaps . outputs different variants depending on allele order in the input variant #2160

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions