Skip to content

bcftools merge incorrectly drops symbolic ALTs via vertical merge #2362

@ACEnglish

Description

@ACEnglish

I have discovered that symbolic variants' END position isn't being considered when running bcftools merge, thus creating a vertical merge.
The documentation states that the command isn't intended for vertical merges, which I believe implies it will not perform a vertical merge, but it is performing a vertical merge, sometimes.

Example

A.vcf

##fileformat=VCFv4.1
##contig=<ID=chr1,length=248956422>
##FILTER=<ID=PASS,Description="All filters passed">
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="SV length">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of structural variation">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	Sample
chr1	147022730	SV1	N	<DEL>	.	PASS	SVLEN=-570334;END=147593064	GT	0/1

B.vcf

##fileformat=VCFv4.1
##contig=<ID=chr1,length=248956422>
##FILTER=<ID=PASS,Description="All filters passed">
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="SV length">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of structural variation">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	Other
chr1	147022730	SV2	N	<DEL>	.	PASS	SVLEN=-990414;END=148013144	GT	1/1

bcftools merge --no-index -m none A.vcf B.vcf

##fileformat=VCFv4.1
##FILTER=<ID=PASS,Description="All filters passed">
##contig=<ID=chr1,length=248956422>
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="SV length">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of structural variation">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##bcftools_mergeVersion=1.21+htslib-1.21
##bcftools_mergeCommand=merge --no-index -m none A.vcf B.vcf; Date=Tue Jan 28 15:57:35 2025
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	Sample	Other
chr1	147022730	SV1;SV2	N	<DEL>	.	PASS	SVLEN=-570334;END=147593064	GT	0/1	1/1

A temporary work around is to use -m id

##fileformat=VCFv4.1
##FILTER=<ID=PASS,Description="All filters passed">
##contig=<ID=chr1,length=248956422>
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="SV length">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of structural variation">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##bcftools_mergeVersion=1.21+htslib-1.21
##bcftools_mergeCommand=merge --no-index -m id A.vcf B.vcf; Date=Tue Jan 28 15:58:26 2025
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	Sample	Other
chr1	147022730	SV1	N	<DEL>	.	PASS	SVLEN=-570334;END=147593064	GT	0/1	./.
chr1	147022730	SV2	N	<DEL>	.	PASS	SVLEN=-990414;END=148013144	GT	./.	1/1

However, assigning unique IDs to variants across files/experiments is non-trivial.

Note that vertical merging happens with/without --no-index.

Original reporter: ACEnglish/truvari#256

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions