One-bit encoding for genotypes #809

jeromekelleher · 2023-03-04T20:27:49Z

WIP

Doesn't look too bad if we abstract things a bit, will have a look at the C API to see if it ports across.

codecov · 2023-03-04T20:38:17Z

Codecov Report

Merging #809 (dde4e26) into main (18b51ae) will decrease coverage by 0.63%.
The diff coverage is 83.11%.

❗ Current head dde4e26 differs from pull request most recent head 77e3466. Consider uploading reports for the commit 77e3466 to get more accurate results

@@            Coverage Diff             @@
##             main     #809      +/-   ##
==========================================
- Coverage   93.34%   92.71%   -0.63%     
==========================================
  Files          17       17              
  Lines        5662     5806     +144     
  Branches     1016     1036      +20     
==========================================
+ Hits         5285     5383      +98     
- Misses        247      291      +44     
- Partials      130      132       +2

Flag	Coverage Δ
C	`92.71% <83.11%> (-0.63%)`	⬇️
python	`95.90% <78.65%> (-0.39%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
tsinfer/formats.py	`96.28% <9.52%> (-1.25%)`	⬇️
lib/ancestor_builder.c	`83.12% <85.71%> (-4.46%)`	⬇️
lib/err.c	`100.00% <100.00%> (ø)`
tsinfer/algorithm.py	`98.56% <100.00%> (+0.08%)`	⬆️
tsinfer/inference.py	`98.57% <100.00%> (+0.01%)`	⬆️

... and 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

benjeffery · 2023-03-07T11:24:35Z

Looking good - let me know when I should run some quick tests.

jeromekelleher · 2023-03-07T20:21:00Z

I think this is basically working @benjeffery, but haven't quite got the tests to pass. Should be at least informative to run this on a decent sized SampleData, if you want to give it a quick try.

jeromekelleher · 2023-03-07T20:22:02Z

Use like generate_ancestors(params as before, genotype_encoding=1)

jeromekelleher · 2023-03-07T20:54:22Z

Aha, that should be it. Curios to see what the perf cost/benefit of this is!

benjeffery · 2023-03-08T12:39:52Z

Not looking good so far, ancestor_builder_finalise is taking a lot longer. I'm not sure how much as it is still running after 8min, (usually that step takes a few seconds). Attaching gdb shows the process to be in unpackbits. I'll try a more optimised version of that function.

jeromekelleher · 2023-03-08T13:25:47Z

Not looking good so far, ancestor_builder_finalise is taking a lot longer.

Hmm, I can see how that might happen. OK, I'll see what I can do.

benjeffery · 2023-03-08T13:41:37Z

This seems to be better:

void unpackbits(const uint8_t *restrict source, size_t len, int8_t *restrict dest) {
    uint64_t MAGIC = 0x8040201008040201ULL;
    uint64_t MASK  = 0x8080808080808080ULL;
    size_t dest_index = 0;
    for (size_t i = 0; i < len; i++) {
        uint64_t t = ((MAGIC*source[i]) & MASK) >> 7;
        *(uint64_t*)&dest[dest_index] = t;
        dest_index += 8;
    }
}

It's running now, and has got past the ancestor_builder_finalise step.

benjeffery · 2023-03-08T13:49:22Z

Whoops, maybe that isn't doing the right thing as some tests are failing now, was running fast until it failed though!!

jeromekelleher · 2023-03-08T13:49:48Z

Currently refactoring to do selective decoding in some places. We can drop a more optimised function in later.

jeromekelleher · 2023-03-08T15:26:44Z

Here's a version with partial decoding @benjeffery. You might see a change in the amount of memory we're grabbing at the start, hopefully that won't be a show stopper for the first pass though.

jeromekelleher · 2023-03-08T17:41:05Z

Looks like this is working pretty well. Here's my informal perf testing based on a simulation with 10K samples and 78K sites:

main: time 1:20s, max resident: 760M
this branch, genotype_encoding=0, time=1:31m max resident: 1.0G
this branch, genotype_encoding=1, time=1:20, max_resident: 460M

So, we're about as fast as the current head and save quite a bit of memory! The memory isn't optimised at the moment, I'll need to take a bit more care with that.

But, I think the approach basically works now and we don't really need any major changes, so I'll try to get this mergable ASAP.

jeromekelleher · 2023-03-08T17:41:40Z

Good to get some confirmation on real data as well though @benjeffery, if you can get it working on the datasets you're working on?

jeromekelleher · 2023-03-08T18:04:09Z

Hmm, might have made a retrograde step with the memory usage, will have another look.

benjeffery · 2023-03-09T01:45:25Z

I also get:

CRITICAL:tsinfer.threads:Exception occured in thread; exiting
CRITICAL:tsinfer.threads:Traceback (most recent call last):
  File "/home/benj/projects/tsinfer/tsinfer/threads.py", line 59, in thread_target
    worker(index)
  File "/home/benj/projects/tsinfer/tsinfer/inference.py", line 1214, in build_worker
    start, end = self.ancestor_builder.make_ancestor(focal_sites, a)
_tsinfer.LibraryError: Bad focal site.

jeromekelleher · 2023-03-10T19:57:49Z

Right, crap, we're reusing a decode buffer across threads.

Can you try again with 1 thread?

jeromekelleher · 2023-03-10T20:19:52Z

that should fix it

benjeffery · 2023-03-14T11:34:55Z

Right, here's some results. Generating ancestors for 1000g with 8 threads (not all fully used):
(793802 sites * 3202 samples * 2 ploidy = 5.08GB/635MB

Code version	wallclock	builder RAM	process resident RAM
1kg(byte)	8:31	n/a	7.2GB
bitpack(byte)	8:58	4.7GB	8.7GB
bitpack(1bit)	8:56	608.7MB	4.3GB
bitpack-magic(1bit)	6:58	608.7MB	4.0GB

So with the "magic" unpack I posted in #809 (comment) we're faster than the normal code, which I guess makes sense as you can hold more data in the CPU cache, and need less main memory bandwidth. The ~3.5GB ram is a constant term from the ancestor write buffers I think, so shouldn't worry us.

I will now try this code on the GEL chr 22.

jeromekelleher · 2023-03-14T16:14:27Z

Awesome!

Since the speed diff isn't massive we can add the bitpack-magic change in a follow-up. It needs some care about buffer sizes, I think, and probably a few choice tests to make sure we don't overflow on awkard cases.

benjeffery · 2023-03-14T19:31:14Z

GEL chr22 in 3:05:51 using 76GB RAM using the bitpack-magic branch. Awesome work on this one, that's hopefully the last big blocker removed!

jeromekelleher · 2023-03-15T09:40:10Z

Great!

Don't use the magic version for actual inference though, there's definitely a buffer overflow and undefined behaviour. Easy fix though (just a case of making sure genotype encode buffer size is divisible by 64 rather than 8)

hyanwong · 2023-03-15T09:42:41Z

BTW, presumably if we have missing data we can use a 2-bit encoding for twice the RAM cost? Would this be useful for others, or hard to code up? No reason to do it now, but maybe an issue for the future?

jeromekelleher · 2023-03-15T09:58:44Z

Yeah, that's the plan

hyanwong · 2023-03-15T10:08:35Z

Yeah, that's the plan

Shall I open an issue?

jeromekelleher · 2023-03-28T14:51:32Z

I think this is ready for final review and merging now.

I've only added the most basic documentation, since we'll also want to consider the mmap stuff for doing this at scale also we may as well wait till that's in to discuss perf in a more accessible way.

jeromekelleher · 2023-03-28T14:54:39Z

The implementation has changed slightly @benjeffery, so may be worth rerunning some basic benchmarks just to be sure.

benjeffery · 2023-03-29T12:17:22Z

Ok will re-run the benchmarks on 1kg.

jeromekelleher · 2023-03-29T12:26:01Z

Can you open an issue to track adding your 64 bit unpack implementation? It should be a nice simple follow up here

benjeffery · 2023-03-29T12:31:50Z

#816

jeromekelleher · 2023-04-03T08:40:56Z

I think this should be ready to go @benjeffery?

benjeffery

LGTM, couple of comments.
There is a block of uncovered code in the data_equal method. I assume that's because all the tests pass, but might be nice to add one unequal set that is unequal by the last criteria to be checked?

tests/test_inference.py

_tsinfermodule.c

jeromekelleher · 2023-04-05T12:46:17Z

Hopefully ready to go now

jeromekelleher force-pushed the bit-pack branch from 390a31a to a7b0868 Compare March 8, 2023 09:20

jeromekelleher force-pushed the bit-pack branch 2 times, most recently from 77fdb30 to 30f4b49 Compare March 28, 2023 14:50

jeromekelleher requested a review from benjeffery March 28, 2023 14:50

jeromekelleher marked this pull request as ready for review March 28, 2023 14:50

jeromekelleher mentioned this pull request Mar 28, 2023

Add two-bit encoding for generate ancestors #815

Open

benjeffery approved these changes Apr 4, 2023

View reviewed changes

tests/test_inference.py Show resolved Hide resolved

_tsinfermodule.c Show resolved Hide resolved

jeromekelleher force-pushed the bit-pack branch from 30f4b49 to 3173a8c Compare April 5, 2023 12:27

Optional one-bit encoding for generate_ancestors

77e3466

jeromekelleher force-pushed the bit-pack branch from 3173a8c to 77e3466 Compare April 5, 2023 12:45

benjeffery added the AUTOMERGE-REQUESTED label Apr 5, 2023

mergify bot merged commit 2b6e898 into tskit-dev:main Apr 5, 2023

mergify bot removed the AUTOMERGE-REQUESTED label Apr 5, 2023

One-bit encoding for genotypes #809

One-bit encoding for genotypes #809

Uh oh!

Conversation

jeromekelleher commented Mar 4, 2023

Uh oh!

codecov bot commented Mar 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

benjeffery commented Mar 7, 2023

Uh oh!

jeromekelleher commented Mar 7, 2023

Uh oh!

jeromekelleher commented Mar 7, 2023

Uh oh!

jeromekelleher commented Mar 7, 2023

Uh oh!

benjeffery commented Mar 8, 2023

Uh oh!

jeromekelleher commented Mar 8, 2023

Uh oh!

benjeffery commented Mar 8, 2023

Uh oh!

benjeffery commented Mar 8, 2023

Uh oh!

jeromekelleher commented Mar 8, 2023

Uh oh!

jeromekelleher commented Mar 8, 2023

Uh oh!

jeromekelleher commented Mar 8, 2023

Uh oh!

jeromekelleher commented Mar 8, 2023

Uh oh!

jeromekelleher commented Mar 8, 2023

Uh oh!

benjeffery commented Mar 9, 2023

Uh oh!

jeromekelleher commented Mar 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeromekelleher commented Mar 10, 2023

Uh oh!

benjeffery commented Mar 14, 2023

Uh oh!

jeromekelleher commented Mar 14, 2023

Uh oh!

benjeffery commented Mar 14, 2023

Uh oh!

jeromekelleher commented Mar 15, 2023

Uh oh!

hyanwong commented Mar 15, 2023

Uh oh!

jeromekelleher commented Mar 15, 2023

Uh oh!

hyanwong commented Mar 15, 2023

Uh oh!

jeromekelleher commented Mar 28, 2023

Uh oh!

jeromekelleher commented Mar 28, 2023

Uh oh!

benjeffery commented Mar 29, 2023

Uh oh!

jeromekelleher commented Mar 29, 2023

Uh oh!

benjeffery commented Mar 29, 2023

Uh oh!

jeromekelleher commented Apr 3, 2023

Uh oh!

benjeffery left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jeromekelleher commented Apr 5, 2023

Uh oh!

Uh oh!

codecov bot commented Mar 4, 2023 •

edited

Loading

jeromekelleher commented Mar 10, 2023 •

edited

Loading