-
Notifications
You must be signed in to change notification settings - Fork 77
Two locus branch stats python prototype #2934
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@jeromekelleher I'm still seeing issues with the build cache. Is there anything I can do to help here? I'm not in a huge rush for what it's worth, just wanted to close this out. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2934 +/- ##
==========================================
- Coverage 89.63% 89.62% -0.01%
==========================================
Files 29 29
Lines 30174 30174
Branches 5873 5873
==========================================
- Hits 27046 27043 -3
- Misses 1789 1791 +2
- Partials 1339 1340 +1
Flags with carried forward coverage won't be shown. Click here to find out more. |
@benjeffery any advice here? |
I'm in progress with this at #2935 |
We've pushed through a few changes in the CI setup, can you rebase now please @lkirk? Hopefully things will go a bit more smoothly now. |
445707c
to
25a8ddc
Compare
Thank you @benjeffery for fixing that! @jeromekelleher This is ready now, I chopped many more slow tests from this. I realized that there are only 2 cores being used on the test servers and I hadn't anticipated how the slow tests would compound in runtime. As written, things were taking ~1.5 hours and I've adjusted things so now only important tests are considered, adding about 2 minutes to the test runtime. Does this seem reasonable? |
2 minutes is still quite a bit. You could just use the naive version on a handful of small cases to verify things are lining up as expected, and defer doing tests on the full range of examples until the C version is in place? |
Any slow tests should be marked with |
25a8ddc
to
acd0d47
Compare
@jeromekelleher Agreed, I've cut a couple more tests to reach a minimal subset and now the runtime is the same -- at least in the number of minutes, I did a quick check of the runtimes on main and compared to these and everything seems to be running in the same amount of time, within a minute at least. Slowest added test runs in ~.2 seconds on my relatively fast computer. Here's the times from the latest run on main: So, it appears that the difference in runtime is sub-minute now. Is that reasonable? If not, I can just mark as slow. |
@Mergifyio rebase |
Currently, this algorithm creates a matrix of LD, performing a pairwise comparison of all trees in the tree sequence. This implementation lacks windows/positions, sample sets and polarisation. The outputs of the code produce results in units of branch length, needing to be multiplied by mu^2 or divided by product of the total branch length of the two trees. This algorithm works by keeping a running sum of the statistic between two trees, updating each time we encounter a branch addition or removal. The tricky part is that we have to remove or add LD contributed by samples that already existed or that will remain under a given node after the addition/removal of branches. We include a validation against the original formulation of this problem, by including an implementation that was described in McVean 2002. The original formulation computing the covariance of tMRCAs of 2, 3, and 4 samples of individuals on the trees in question. This implementation has several limitations 1) it is very slow and 2) it does not work for trees that are decapitated, as certain samples do not have MRCAs.
✅ Branch has been successfully rebased |
acd0d47
to
343a89b
Compare
I've re-opened a PR to avoid some issues with build caching. Please see the original discussion on #2912.
Currently, this algorithm creates a matrix of LD, performing a pairwise comparison of all trees in the tree sequence.
This implementation lacks windows/positions, sample sets and polarisation. The outputs of the code produce results in units of branch length, needing to be multiplied by mu^2 or divided by product of the total branch length of the two trees.
This algorithm works by keeping a running sum of the statistic between two trees, updating each time we encounter a branch addition or removal. The tricky part is that we have to remove or add LD contributed by samples that already existed or that will remain under a given node after the addition/removal of branches.
We include a validation against the original formulation of this problem, by including an implementation that was described in McVean
2002. The original formulation computing the covariance of tMRCAs of 2, 3, and 4 samples of individuals on the trees in question. This implementation has several limitations 1) it is very slow and 2) it does not work for trees that are decapitated, as certain samples do not have MRCAs.