-
Notifications
You must be signed in to change notification settings - Fork 58
Description
Hi @ACEnglish,
Thank you for creating Truvari—it has been incredibly helpful for SV analysis! However, I encountered an issue with the recommended bcftools command for truvari collapse outlined on the wiki page.
To start, we merge multiple VCFs (each with their own sample) and ensure there are no multi-allelic entries via:
bcftools merge -m none one.vcf.gz two.vcf.gz | bgzip > merge.vcf.gz
However, this command can merge non-identical SV events into a single record if they share the same chr and start position. This results in records with different variants being collapsed together in one line, which could lead to interpretation issues with truvari collapse later. For example:
>> one.vcf:
chr1 147022730 DRAGEN:LOSS:chr1:147022731-147593064 N <DEL> 84 PASS SVLEN=-570334;SVTYPE=CNV;END=147593064;REFLEN=570334;RIGHT_BND=DRAGEN:DEL:6916:0:1:0:0:0;OrigCnvEnd=147593080;SVCLAIM=DJ GT:SM:CN:BC:GC:CT:AC:PE 0/1:0.506255:1:456:0.403456:0.498899:0.503649:3,12
>> two.vcf:
chr1 147022730 DRAGEN:LOSS:chr1:147022731-148013144 N <DEL> 93 PASS SVLEN=-990414;SVTYPE=CNV;END=148013144;REFLEN=990414;SVCLAIM=D GT:SM:CN:BC:GC:CT:AC:PE 0/1:0.513739:1:751:0.411265:0.499147:0.502668:1,1
>> merged.vcf
chr1 147022730 DRAGEN:LOSS:chr1:147022731-147593064;DRAGEN:LOSS:chr1:147022731-148013144 N <DEL> 93 PASS SVLEN=-570334;SVTYPE=CNV;END=147593064;REFLEN=570334;RIGHT_BND=DRAGEN:DEL:6916:0:1:0:0:0;OrigCnvEnd=147593080;SVCLAIM=DJ GT:SM:CN:BC:GC:CT:AC:PE 0/1:0.506255:1:456:0.403456:0.498899:0.503649:3,12 0/1:0.513739:1:751:0.411265:0.499147:0.502668:1,1
To resolve this, I used the -m id flag instead, which only merge records if they have shared ID.
bcftools merge -m id one.vcf.gz two.vcf.gz | bgzip > merge.vcf.gz
Would it be possible to update the wiki to reflect this alternative command? I believe it could save future users some time and help ensure accurate results from truvari collapse.
Thanks again for your great work!