-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster prior variance calculation #306
Conversation
Some interesting (but possibly useless) applications of this algorithm:
|
Really neat. Thanks for this Nate. |
Darn, this is working but fried some unit tests. |
ec470a9
to
ec28ff3
Compare
This is good to go. I moved the original, slow variance calculation into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Maybe it's worth keeping a few tests for the simpler version? Also, using the approximation only when N>20,000 seems quite a high threshold. I assume it's so fast that this is fine? And I assume that's when you removed the "Calculating Prior Variances" tqdm feedback?
Also, for very large inferences, if e.g. N=1e6 and the user has set |
Hmm, I removed the progress bar because the variance calculation is done recursively in numba -- so, no way to output progress without slowing things down dramatically. |
I think it takes around 1/2 a minute, according to the benchmark above? That's really when the cost starts kicking in. |
Oh, well, if there's no obvious trick there then just drop it. We could output a logging warning that this might take a while if |
Yes, that's a good idea |
Ok I think this is done-- except I'm not sure about the default for using the approximate prior. I suppose that if there's missing data, the calculation has to repeated for every realized # of tips across the tree sequence? So, I set the default to 10,000 which should take on the order of a second according to benchmark above. Seem OK? |
A second sounds OK to me, as does a default of 10,000, so I'll just merge this. Thanks a lot Nate! |
Use a recursive approach to calculating prior variances that is a couple of orders of magnitude faster:
so it's possible to get the exact variances for 100,000 tips in a few minutes (by comparison, it's infeasible to get exact variances for more than 2,000 tips or so with the original algorithm).
Some indication that the two algorithms are equivalent, showing sqrt(variance) for 1280 tips:
Still trying to get my head around the interpolation scheme used for the approximation, and if this algorithm will be useful there (it recurses backwards from the maximum number of tips).