Add --high_confidence option for dual hybrid genomes

We have come across certain position in the genome where different strains appear to have the same SNP (indicated by the GT/genotype field), but one of the strains failed the FI/FILTER criterium (1 is PASS, 0 is FAIL). Here is an example:

**GT**:GQ:DP:MQ0F:GP:PL:AN:MQ:DV:DP4:SP:SGB:PV4:**FI**
**1/1**:22:6:0.166667:152,22,0:137,18,0:2:36:6:0,0,6,0:0:-0.616816:.:**1** (129) **1/1**:15:4:0:79,15,0:67,12,0:2:24:4:0,0,4,0:0:-0.556411:.:**0** (Cast)

For single hybrid genomes we would include this position into the 129 genome (1/1 homozygous SNP, first line), but would ignore the position for the Cast genome (also 1/1 homozygous SNP, but failed the high confidence FI filter, second line). This seems like a reasonable approach.

For dual hybrid genomes such positions might be a problem though because when the 129 and Cast SNP lists are compared with each other it looks like there is now a SNP between 129 and Cast, even though there was evidence that the genotype was the same (1/1) in and Cast, only that it did not pass the threshold to count as high confidence SNP in Cast.

As a solution to this can we change the `SNPsplit genome preparation` to store the `FI` value as well as the `GT` genotype and only use the position for a dual-hybrid SNP list if the position was measured with high confidence (i.e. `FI=1`) in both strains? Thanks to @nservant for helpful discussions in this regard.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add --high_confidence option for dual hybrid genomes #9

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add --high_confidence option for dual hybrid genomes #9

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions