Skip to content

VCF rlen calculation can exceed INFO/END by one for symbolic DEL with SVLEN #2019

@ehsanestaji

Description

@ehsanestaji

Hi, I ran into this while investigating pysam issue pysam-developers/pysam#1407.

With a symbolic deletion record that has both END and SVLEN, htslib-backed parsing now reports an rlen/stop one base larger than INFO/END implies.

Minimal VCF:

##fileformat=VCFv4.2
##contig=<ID=chr1,length=5000000>
##INFO=<ID=END,Number=1,Type=Integer,Description="End position">
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="SV length">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="SV type">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	SAMPLE
chr1	2651001	.	N	<DEL>	.	PASS	END=2658000;SVLEN=7000;SVTYPE=DEL	GT	0/1

Observed via pysam, which exposes htslib's bcf1_t.rlen:

import pysam
rec = next(pysam.VariantFile("symbolic_deletion.vcf"))
print(rec.start, rec.stop, rec.rlen)

On pysam==0.24.0 / bundled htslib 1.23.1, this prints:

2651000 2658001 7001

On pysam==0.23.3, the same record prints:

2651000 2658000 7000

From reading vcf.c:get_rlen(), the new value appears to come from SVLEN=7000 being converted to end_svlen = v->pos + len + 1, then taking the maximum of END and end_svlen. Since v->pos is 0-based and INFO/END is 1-based inclusive, this makes the effective 0-based exclusive stop one base beyond END for this record.

Question: when END is present for a symbolic <DEL>, should it remain authoritative for the 0-based exclusive interval exposed as pos + rlen, or is this SVLEN interpretation expected under the newer VCF 4.4/4.5 rlen logic? If this is intended behavior, it would help to clarify so pysam can adjust expectations/docs. If not, I am happy to help with a small regression test/fix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions