-
Notifications
You must be signed in to change notification settings - Fork 260
Closed
Labels
Description
This is not an issue, but a behavior that maybe could be more optimal.
I have noticed that when splitting multiallelic variants in different VCF files, you don't necessarily end up with variants in the same order. Here is an example:
(echo "##fileformat=VCFv4.1"
echo "##contig=<ID=chr19>"
echo -e "#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO"
echo -e "chr19\t50359054\t.\tC\tT,A\t.\t.\t.") > in.vcf
Now if I split with bcftools norm --multiallelics -:
$ bcftools norm --multiallelics - --no-version in.vcf
##fileformat=VCFv4.1
##FILTER=<ID=PASS,Description="All filters passed">
##contig=<ID=chr19>
#CHROM POS ID REF ALT QUAL FILTER INFO
chr19 50359054 . C T . . .
chr19 50359054 . C A . . .
Lines total/split/realigned/skipped: 1/1/0/0
But if I further sort the file:
$ bcftools norm --multiallelics - --no-version in.vcf | bcftools sort
Writing to /tmp/bcftools-sort.HmQM3B
Lines total/split/realigned/skipped: 1/1/0/0
Merging 1 temporary files
##fileformat=VCFv4.1
##FILTER=<ID=PASS,Description="All filters passed">
##contig=<ID=chr19>
#CHROM POS ID REF ALT QUAL FILTER INFO
chr19 50359054 . C A . . .
chr19 50359054 . C T . . .
Cleaning
Done
The order of the variants has changed. I suppose it would be quite a bit of work to rewrite bcftools norm to sort the variants after splitting them so that you would not require a sort but reporting nevertheless just in case.
I have noticed this as some HTSlib tools (such as IMPUTE5 v1.1.4) do not work if VCFs have different orders in the variants.