Skip to content

Add --high_confidence option for dual hybrid genomes #9

Closed
@FelixKrueger

Description

@FelixKrueger

We have come across certain position in the genome where different strains appear to have the same SNP (indicated by the GT/genotype field), but one of the strains failed the FI/FILTER criterium (1 is PASS, 0 is FAIL). Here is an example:

GT:GQ:DP:MQ0F:GP:PL:AN:MQ:DV:DP4:SP:SGB:PV4:FI
1/1:22:6:0.166667:152,22,0:137,18,0:2:36:6:0,0,6,0:0:-0.616816:.:1 (129) 1/1:15:4:0:79,15,0:67,12,0:2:24:4:0,0,4,0:0:-0.556411:.:0 (Cast)

For single hybrid genomes we would include this position into the 129 genome (1/1 homozygous SNP, first line), but would ignore the position for the Cast genome (also 1/1 homozygous SNP, but failed the high confidence FI filter, second line). This seems like a reasonable approach.

For dual hybrid genomes such positions might be a problem though because when the 129 and Cast SNP lists are compared with each other it looks like there is now a SNP between 129 and Cast, even though there was evidence that the genotype was the same (1/1) in and Cast, only that it did not pass the threshold to count as high confidence SNP in Cast.

As a solution to this can we change the SNPsplit genome preparation to store the FI value as well as the GT genotype and only use the position for a dual-hybrid SNP list if the position was measured with high confidence (i.e. FI=1) in both strains? Thanks to @nservant for helpful discussions in this regard.

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions