-
Notifications
You must be signed in to change notification settings - Fork 260
Description
I have a vcf file where both the genotype (GT) and the read depth (DP) is given for each sample. I would like to set all genotypes where the read depth is below x (DP<x) to missing (GT-->'.'). For that, I use the bcftools +setGT pluging. However, the filtering for genotypes with a low read depth does not do what it should do if I set the limit x too high.
Here is a minimal working example:
bcftools view -H minimal_vcf_file_2_samples_gt_and_dp.vcf.gz
1 1890 . A G 0.2 . . GT:DP 1/1:12 0/0:102 0/1:10 1/1:20
2 1991 . G T 0.9 . . GT:DP 1/1:8 0/1:100 0/0:11 0/0:11
bcftools +setGT minimal_vcf_file_2_samples_gt_and_dp.vcf.gz -- -t q -n . -e 'FMT/DP>=12' | bcftools view -H
Filled 8 alleles
1 1890 . A G 0.2 . . GT:DP 1/1:12 0/0:102 ./.:10 1/1:20
2 1991 . G T 0.9 . . GT:DP ./.:8 0/1:100 ./.:11 ./.:11`
That is as expected: the genotypes with read depth below 12 are set to missing.
But here is the problem:
bcftools +setGT minimal_vcf_file_2_samples_gt_and_dp.vcf.gz -- -t q -n . -e 'FMT/DP>=101' | bcftools view -H
Filled 6 alleles
1 1890 . A G 0.2 . . GT:DP ./.:12 0/0:102 ./.:10 ./.:20
2 1991 . G T 0.9 . . GT:DP 1/1:8 0/1:100 0/0:11 0/0:11
In the second row, no genotypes are filled.
But when I do:
bcftools +setGT minimal_vcf_file_2_samples_gt_and_dp.vcf.gz -- -t q -n . -e 'FMT/DP>=100' | bcftools view -H
Filled 12 alleles
1 1890 . A G 0.2 . . GT:DP ./.:12 0/0:102 ./.:10 ./.:20
2 1991 . G T 0.9 . . GT:DP ./.:8 0/1:100 ./.:11 ./.:11
It behaves as expected. Could you help me figure out what is going on?
Maybe it ignores the command if for a specific variant/row no sample is above the threshold?
My version is
bcftools --version
bcftools 1.12
Using htslib 1.12
I would appreciate your help, thank you.
UPDATE: I think the problem is that bcftools is not setting any genotypes for a row/variant if the set of excluded samples is empty. That is a problem since instead of setting genotypes for all samples, it does it for no samples. However, by replacing exclusion by an inclusion (-e to -i) and reversing the filtering condition (>= to <), the bug can be fixed: if the set of genotypes to be set is empty, not doing anything is exactly what you want the command to do:
bcftools +setGT minimal_vcf_file_2_samples_gt_and_dp.vcf.gz -- -t q -n . -i 'FMT/DP<101' | bcftools view -H
Filled 14 alleles
1 1890 . A G 0.2 . . GT:DP ./.:12 0/0:102 ./.:10 ./.:20
2 1991 . G T 0.9 . . GT:DP ./.:8 ./.:100 ./.:11 ./.:11
It might still be good to fix this bug though.