-
Notifications
You must be signed in to change notification settings - Fork 15
One-bit encoding for genotypes #809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #809 +/- ##
==========================================
- Coverage 93.34% 92.71% -0.63%
==========================================
Files 17 17
Lines 5662 5806 +144
Branches 1016 1036 +20
==========================================
+ Hits 5285 5383 +98
- Misses 247 291 +44
- Partials 130 132 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 1 file with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Looking good - let me know when I should run some quick tests. |
I think this is basically working @benjeffery, but haven't quite got the tests to pass. Should be at least informative to run this on a decent sized SampleData, if you want to give it a quick try. |
Use like |
Aha, that should be it. Curios to see what the perf cost/benefit of this is! |
Not looking good so far, |
Hmm, I can see how that might happen. OK, I'll see what I can do. |
This seems to be better: void unpackbits(const uint8_t *restrict source, size_t len, int8_t *restrict dest) {
uint64_t MAGIC = 0x8040201008040201ULL;
uint64_t MASK = 0x8080808080808080ULL;
size_t dest_index = 0;
for (size_t i = 0; i < len; i++) {
uint64_t t = ((MAGIC*source[i]) & MASK) >> 7;
*(uint64_t*)&dest[dest_index] = t;
dest_index += 8;
}
} It's running now, and has got past the |
Whoops, maybe that isn't doing the right thing as some tests are failing now, was running fast until it failed though!! |
Currently refactoring to do selective decoding in some places. We can drop a more optimised function in later. |
Here's a version with partial decoding @benjeffery. You might see a change in the amount of memory we're grabbing at the start, hopefully that won't be a show stopper for the first pass though. |
Looks like this is working pretty well. Here's my informal perf testing based on a simulation with 10K samples and 78K sites:
So, we're about as fast as the current head and save quite a bit of memory! The memory isn't optimised at the moment, I'll need to take a bit more care with that. But, I think the approach basically works now and we don't really need any major changes, so I'll try to get this mergable ASAP. |
Good to get some confirmation on real data as well though @benjeffery, if you can get it working on the datasets you're working on? |
Hmm, might have made a retrograde step with the memory usage, will have another look. |
I also get:
|
Right, crap, we're reusing a decode buffer across threads. Can you try again with 1 thread? |
that should fix it |
Right, here's some results. Generating ancestors for 1000g with 8 threads (not all fully used):
So with the "magic" unpack I posted in #809 (comment) we're faster than the normal code, which I guess makes sense as you can hold more data in the CPU cache, and need less main memory bandwidth. The ~3.5GB ram is a constant term from the ancestor write buffers I think, so shouldn't worry us. I will now try this code on the GEL chr 22. |
Awesome! Since the speed diff isn't massive we can add the bitpack-magic change in a follow-up. It needs some care about buffer sizes, I think, and probably a few choice tests to make sure we don't overflow on awkard cases. |
GEL chr22 in 3:05:51 using 76GB RAM using the bitpack-magic branch. Awesome work on this one, that's hopefully the last big blocker removed! |
Great! Don't use the magic version for actual inference though, there's definitely a buffer overflow and undefined behaviour. Easy fix though (just a case of making sure genotype encode buffer size is divisible by 64 rather than 8) |
BTW, presumably if we have missing data we can use a 2-bit encoding for twice the RAM cost? Would this be useful for others, or hard to code up? No reason to do it now, but maybe an issue for the future? |
Yeah, that's the plan |
Shall I open an issue? |
77fdb30
to
30f4b49
Compare
I think this is ready for final review and merging now. I've only added the most basic documentation, since we'll also want to consider the mmap stuff for doing this at scale also we may as well wait till that's in to discuss perf in a more accessible way. |
The implementation has changed slightly @benjeffery, so may be worth rerunning some basic benchmarks just to be sure. |
Ok will re-run the benchmarks on 1kg. |
Can you open an issue to track adding your 64 bit unpack implementation? It should be a nice simple follow up here |
I think this should be ready to go @benjeffery? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, couple of comments.
There is a block of uncovered code in the data_equal
method. I assume that's because all the tests pass, but might be nice to add one unequal set that is unequal by the last criteria to be checked?
Hopefully ready to go now |
WIP
Doesn't look too bad if we abstract things a bit, will have a look at the C API to see if it ports across.