-
Notifications
You must be signed in to change notification settings - Fork 16
Allow missing data in samples #169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow missing data in samples #169
Conversation
|
Ping @marianne-aspbury - this is not quite ready yet, as the Li & Stevens matching algorithm doesn't deal with missing data properly, but it's a good start, and the test suite passes. Most of the fundamental work is in algorithm.py |
Codecov Report
@@ Coverage Diff @@
## master #169 +/- ##
==========================================
- Coverage 91.86% 91.79% -0.08%
==========================================
Files 15 15
Lines 4489 4523 +34
Branches 807 818 +11
==========================================
+ Hits 4124 4152 +28
- Misses 245 249 +4
- Partials 120 122 +2
Continue to review full report at Codecov.
|
a8ad239 to
7dd5061
Compare
|
NB: currently failing tests because it relies on tskit v0.2.0. Tests pass fine on my machine with the new tskit, so I'm simply waiting for the next tskit release before seeking review. |
4217438 to
9c26fc0
Compare
|
Currently the I have added in a test case, |
c493191 to
0622d85
Compare
0622d85 to
fddef64
Compare
|
@jeromekelleher I think this is ready to merge. It contains a few separate fixes, so we might want to split them apart. Quite a few are interdependent, however - e.g. upgrading to tskit 0.2.1 means fixing the uint8/int8 haplotype storage, and fixing the appveyor build, then dealing with the missing data in int8 haplotypes means fixing the map_mutations method, so I haven't tried to disentangle them all. |
|
Can you bring this up to date please @hyanwong? Probably the simplest thing to do is rebase, dropping the commits before 47a0153. It's probably easiest to drop the last commit also, as appveyor has been fixed up on master. You may want to separate out the map_mutations code from missing data handling, since this is quite a bit change in itself worthy of its own PR. |
d3e7bdf to
942fa8d
Compare
|
I forgot that the map_mutations code needs committing before this will pass tests. Could you merge #185 first @jeromekelleher ? |
7835114 to
4beec4e
Compare
|
Just to confirm, this will handle missing data in samples, but not missing sites, correct? So if not all the samples have information at a given site this would work, but if the sample_data file does not contain all the sites in an ancestors_tree_sequence the code still would break. |
|
Yes, comments in the function So I guess it's not set up for changing the inference sites between the ancestors TS and the |
|
My work around for that check was to remove sites in the |
|
Yes, it's not so simple to change the inference sites in the middle of the pathway, as the indexes will be all up the spout. I'll open another issue about this. |
4bf9085 to
478f05f
Compare
|
The current implementation allows sites with missing data to be used for inference, but doesn't change the matching algorithm, which is why All of the work, apart from the incorporation of missing data into matching, is now in this PR, so it should be relatively easy to squash all my commits and merge once the L&S plumbing is done. |
d645c63 to
db7f360
Compare
|
@hyanwong, I've rebased this and made minimal changes to make the tests pass. I suggest we merge this and then I'll finish up the process of implementing missing data. |
db7f360 to
5bb2089
Compare
|
Perfect, thanks |
For brief review, but not merging yet. Addresses #153