Add information about samples to VCF header

Follow up to #2445.

The default sample names in our VCF mapping are always ``tsk_0``, ``tsk_1`` etc, and these may not have anything to do with the IDs of the individuals themselves. We should update the VCF output inorder to provide information about the sample IDs corresponding to each VCF sample, and (if relevant) the individual ID. This will make it much easier to map information from the VCF back into the tskit data model.

VCF [version 4.3](http://samtools.github.io/hts-specs/VCFv4.3.pdf) declares a "Sample field format" (section 1.4.8). Complete definition included here, for easy reference:

```
It is possible to define sample to genome mappings as shown below:
##META=<ID=Assay,Type=String,Number=.,Values=[WholeGenome, Exome]>
##META=<ID=Disease,Type=String,Number=.,Values=[None, Cancer]>
##META=<ID=Ethnicity,Type=String,Number=.,Values=[AFR, CEU, ASN, MEX]>
##META=<ID=Tissue,Type=String,Number=.,Values=[Blood, Breast, Colon, Lung, ?]>
##SAMPLE=<ID=Sample1,Assay=WholeGenome,Ethnicity=AFR,Disease=None,Description="Patient germline genome from unaffected",DOI=url>
##SAMPLE=<ID=Sample2,Assay=Exome,Ethnicity=CEU,Disease=Cancer,Tissue=Breast,Description="European patient exome from breast cancer">
```

So, we could do something like (syntax not quite right for NodeIds I think):
```
##META=<ID=NodeId,Type=Number,Number=.>
##META=<ID=IndivididualId,Type=Number,Number=1>
##SAMPLE=<ID=tsk_0,NodeIds=0,1,IndividualsId=0>
```
So, if we're in the "no individual data" case, then we don't include an individuals ID. It's probably handy to always include the node IDs to make it easier to backtrack this information.

I guess this information will be quite large sometimes, so it's probably worth providing an option to suppress it (``sample_header_info=False``, I guess?)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add information about samples to VCF header #2447

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add information about samples to VCF header #2447

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions