-
Notifications
You must be signed in to change notification settings - Fork 260
Closed
Description
Dear all,
I have a question regarding the handling of missing values when using != and !~ in filtering expression in bcftools v1.19+htslib-1.19. To support my question I include a minimal vcf file with one info tag named TAG.
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##contig=<ID=chr1,length=5>
##INFO=<ID=TAG,Number=.,Type=String,Description="Some tag">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##bcftools_viewVersion=1.19+htslib-1.19
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT S1
chr1 1 . * * . . TAG=a,b,c GT 0/0
chr1 2 . * * . . TAG=a GT 0/0
chr1 3 . * * . . TAG=. GT 0/.
chr1 4 . * * . . TAG=.,. GT ./0
chr1 5 . * * . . TAG=a,.,c GT ./.
I can use the standard string comparison operator in view -i TAG[*]!="." to include only sites with at least one non missing value for TAG:
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##contig=<ID=chr1,length=5>
##INFO=<ID=TAG,Number=.,Type=String,Description="Some tag">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##bcftools_viewVersion=1.19+htslib-1.19
##bcftools_viewCommand=view -i TAG[*]!="." input.vcf; Date=Sat Jan 18 17:33:45 2025
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT S1
chr1 1 . * * . . TAG=a,b,c GT 0/0
chr1 2 . * * . . TAG=a GT 0/0
chr1 5 . * * . . TAG=a,.,c GT ./.
Doing the same with the regex operator view -i TAG[*]!~"\." does not filter out any variants:
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##contig=<ID=chr1,length=5>
##INFO=<ID=TAG,Number=.,Type=String,Description="Some tag">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##bcftools_viewVersion=1.19+htslib-1.19
##bcftools_viewCommand=view -i TAG[*]!~"\." input.vcf; Date=Sat Jan 18 17:37:27 2025
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT S1
chr1 1 . * * . . TAG=a,b,c GT 0/0
chr1 2 . * * . . TAG=a GT 0/0
chr1 3 . * * . . TAG=. GT 0/.
chr1 4 . * * . . TAG=.,. GT ./0
chr1 5 . * * . . TAG=a,.,c GT ./.
Sorry just realized that this is maybe non conclusive, thus I add another example with view -i TAG[*]!~"[A-z]"
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##contig=<ID=chr1,length=5>
##INFO=<ID=TAG,Number=.,Type=String,Description="Some tag">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##bcftools_viewVersion=1.19+htslib-1.19
##bcftools_viewCommand=view -i TAG[*]!~"[A-z]" input.vcf; Date=Sun Jan 19 15:15:24 2025
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT S1
chr1 3 . * * . . TAG=. GT 0/.
chr1 4 . * * . . TAG=.,. GT ./0
chr1 5 . * * . . TAG=a,.,c GT ./.
Does the negated regex operator automatically evaluate to true for missing values?
Metadata
Metadata
Assignees
Labels
No labels