Options for dealing with missing flanking data in samples

As discussed with @jeromekelleher, it would be useful to be able to produce inferred tree sequences where the input samples have missing data to the left and right of a known sequence (e.g. fragments from a sequencer). This ability has been removed from https://github.com/tskit-dev/tsinfer/pull/169, as it seems easier and more flexible to do this after inference-with-imputation on the fragmented sequences. For example, this would allow large sections of missing data in the middle of a sample (instead of only in the flanking regions) to be marked as "truly missing and not for imputation"

I intend to  write a function to do this independently of the inference process. All this needs to do is to take an inferred TS and its corresponding SampleData file, remove the edges that link a sample to the tree at sites that are missing in that particular sample, and `simplify(keep_unary=True)`. This issue replaces #153, and should eventually subsume #173.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Options for dealing with missing flanking data in samples #224

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Options for dealing with missing flanking data in samples #224

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions