-
Notifications
You must be signed in to change notification settings - Fork 260
Description
When reading a sample file to list the samples in a VCF, the vcfquery.c code calls function:
static void list_columns(args_t *args)
{
void *has_sample = NULL;
if ( args->sample_list )
{
has_sample = khash_str2int_init();
int i, nsmpl;
char **smpl = hts_readlist(args->sample_list, args->sample_is_file, &nsmpl);
for (i=0; i<nsmpl; i++) khash_str2int_inc(has_sample, smpl[i]);
free(smpl);
}
...
}There is a missing check for the sample file to have correctly open. This line is missing after calling function hts_readlist():
if ( !smpls ) error("Could not parse %s\n", args->sample_list);which can cause the code to segfault if the file is missing.
Slightly related, it seems like bcftools query and bcftools convert have support for exclusion with "^" prefix, this is not fully supported. As an example, this is code in smpl_ilist.c:
char **list = hts_readlist(negate?sample_list+1:sample_list, is_file, &nlist);
if ( !list ) error("Could not parse %s\n", sample_list);While this is code in vcfconvert.c:
char **smpls = hts_readlist(args->sample_list, args->sample_is_file, &n);
if ( !smpls ) error("Could not parse %s\n", args->sample_list);and this is code vcfquery.c:
char **smpls = hts_readlist(args->sample_list, args->sample_is_file, &n);
if ( !smpls ) error("Could not parse %s\n", args->sample_list);Both snippets of code do not take the possibility of exclusion into account.
This leads to inconsistent behavior, as shown in the following example:
(echo "##fileformat=VCFv4.2"
echo "##contig=<ID=chr1>"
echo "##FORMAT=<ID=GT,Number=1,Type=String,Description=\"Genotype\">"
echo -e "#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\tFORMAT\tA\tB"
echo -e "chr1\t10000\t.\tA\tC\t.\t.\t.\tGT\t0/0\t0/1") > A.vcf
echo "A" > A.lst
$ bcftools query --format "[%SAMPLE\n]" --samples-file A.lst A.vcf
A
$ bcftools query --list-samples --samples-file A.lst A.vcf
A
$ bcftools query --format "[%SAMPLE\n]" --samples-file ^A.lst A.vcf
B
$ bcftools query --list-samples --samples-file ^A.lst A.vcf
With the last command producing no output (and potentially yielding a segfault).
Also the following behavior is slightly inconsistent: bcftools query does not have the option --force-samples as bcftools view does but the behavior is not consistent:
echo -e "A\nC" > B.lst
$ bcftools query --format "[%SAMPLE\n]" --samples-file B.lst A.vcf
Sample name mismatch: sample #2 not found in the header
$ bcftools query --list-samples --samples-file B.lst A.vcf
A