Skip to content

bcftools query --list-samples --samples-file can cause segfault #1631

@freeseek

Description

@freeseek

When reading a sample file to list the samples in a VCF, the vcfquery.c code calls function:

static void list_columns(args_t *args)
{
    void *has_sample = NULL;
    if ( args->sample_list )
    {
        has_sample = khash_str2int_init();
        int i, nsmpl;
        char **smpl = hts_readlist(args->sample_list, args->sample_is_file, &nsmpl);
        for (i=0; i<nsmpl; i++) khash_str2int_inc(has_sample, smpl[i]);
        free(smpl);
    }

   ...
}

There is a missing check for the sample file to have correctly open. This line is missing after calling function hts_readlist():

if ( !smpls ) error("Could not parse %s\n", args->sample_list);

which can cause the code to segfault if the file is missing.

Slightly related, it seems like bcftools query and bcftools convert have support for exclusion with "^" prefix, this is not fully supported. As an example, this is code in smpl_ilist.c:

char **list = hts_readlist(negate?sample_list+1:sample_list, is_file, &nlist);
if ( !list ) error("Could not parse %s\n", sample_list);

While this is code in vcfconvert.c:

char **smpls = hts_readlist(args->sample_list, args->sample_is_file, &n);
if ( !smpls ) error("Could not parse %s\n", args->sample_list);

and this is code vcfquery.c:

char **smpls = hts_readlist(args->sample_list, args->sample_is_file, &n);
if ( !smpls ) error("Could not parse %s\n", args->sample_list);

Both snippets of code do not take the possibility of exclusion into account.

This leads to inconsistent behavior, as shown in the following example:

(echo "##fileformat=VCFv4.2"
echo "##contig=<ID=chr1>"
echo "##FORMAT=<ID=GT,Number=1,Type=String,Description=\"Genotype\">"
echo -e "#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\tFORMAT\tA\tB"
echo -e "chr1\t10000\t.\tA\tC\t.\t.\t.\tGT\t0/0\t0/1") > A.vcf
echo "A" > A.lst
$ bcftools query --format "[%SAMPLE\n]" --samples-file A.lst A.vcf
A
$ bcftools query --list-samples --samples-file A.lst A.vcf
A
$ bcftools query --format "[%SAMPLE\n]" --samples-file ^A.lst A.vcf
B
$ bcftools query --list-samples --samples-file ^A.lst A.vcf

With the last command producing no output (and potentially yielding a segfault).

Also the following behavior is slightly inconsistent: bcftools query does not have the option --force-samples as bcftools view does but the behavior is not consistent:

echo -e "A\nC" > B.lst
$ bcftools query --format "[%SAMPLE\n]" --samples-file B.lst A.vcf
Sample name mismatch: sample #2 not found in the header
$ bcftools query --list-samples --samples-file B.lst A.vcf
A

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions