Skip to content

Bug in multi-allele matching? #977

Open
@jeromekelleher

Description

@jeromekelleher

I've spent the day tracking down a bug in sc2ts, which is pointing towards being a problem with tsinfer's multi-allele handling (and allele index rotating to deal with non-zero ancestral state). Based on my experimental HMM branch #959 the version of sc2ts at e5fd80369b2e0813c2e46b1810980bb640054ed6 was loading data into a tsinfer SampleData instance, running some matches and getting the results out by overriding the tsinfer.SampleMatcher class.

Specifically, what seems to be happening is that when matching strain ERR4207042 against the 2020-03-11 ARG tsinfer is missing that this sample has a T at this site not a G (ancestral state). The haplotype we send in looks OK, but we match to a node that has the ancestral state without reporting a mutation in the output. It's not clear to me how it happened, other than I'm guessing there's some problem with the ancestral state allele rotation code.

I say this because sc2ts now uses the low-level components directly, bypassing the SampleData class entirely and is correctly making these matches (which is how I spotted the problem). So I don't think it's a problem with the actual HMM, more some issue with the high-level wrapping code.

I don't have time to create a reproducible example I'm afraid.

I don't know whether we need to do anything with this pre release - I guess we don't support matching on multiallelic sites anyway, and we're pretty sure the biallelic case is working fine, right?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions