faster ALP encode #924

lwwmanning · 2024-09-25T01:09:02Z

fixes #920

Consistently cuts encoding time by 10-50%.

Before the change:

Running benches/alp_compress.rs (target/release/deps/alp_compress-abbdaefc5eabf343)
Timer precision: 41 ns
alp_compress          fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ alp_compress                     │               │               │               │         │
│  ├─ f32                           │               │               │               │         │
│  │  ├─ 100000       191.9 µs      │ 824.9 µs      │ 314.7 µs      │ 354 µs        │ 100     │ 100
│  │  ╰─ 10000000     21.39 ms      │ 28.95 ms      │ 21.71 ms      │ 21.89 ms      │ 100     │ 100
│  ╰─ f64                           │               │               │               │         │
│     ├─ 100000       236 µs        │ 353.7 µs      │ 238.4 µs      │ 246.4 µs      │ 100     │ 100
│     ╰─ 10000000     28.78 ms      │ 68.68 ms      │ 29.49 ms      │ 29.93 ms      │ 100     │ 100

After:

Running benches/alp_compress.rs (target/release/deps/alp_compress-abbdaefc5eabf343)
Timer precision: 41 ns
alp_compress          fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ alp_compress                     │               │               │               │         │
│  ├─ f32                           │               │               │               │         │
│  │  ├─ 100000       161 µs        │ 234.6 µs      │ 163.3 µs      │ 166 µs        │ 100     │ 100
│  │  ╰─ 10000000     18.72 ms      │ 21.54 ms      │ 19.07 ms      │ 19.14 ms      │ 100     │ 100
│  ╰─ f64                           │               │               │               │         │
│     ├─ 100000       182 µs        │ 346 µs        │ 183.9 µs      │ 187.9 µs      │ 100     │ 100
│     ╰─ 10000000     23.98 ms      │ 28.71 ms      │ 24.52 ms      │ 24.53 ms      │ 100     │ 100

encodings/alp/src/alp.rs

a10y

nice!

robert3005

one small nit

encodings/alp/src/alp.rs

lwwmanning · 2024-09-25T23:32:43Z

encodings/alp/src/alp.rs

+    }
+
+    // if there are no patches, we are done
+    if chunk_patch_count == 0 {


Need to handle the edge case of 2 chunks where chunk 0 is all patches, chunk 1 has 0 patches... which won't fill

Realized that there's an unhandled edge case in #924, [commented here](https://github.com/spiraldb/vortex/pull/924/files#r1776099681) Essentially, on develop, if we have two chunks and the first chunk is all patches and the second chunk has 0 patches, then the patched values won't get filled in the encoded array. Not the end of the world (they're presumably full of integer approximations that don't round-trip), but if it's a case of outlier large values that are getting patched, then the encoded values will end up bitpacking poorly. This PR fixes that.

This PR trims invalid values from the patches and makes the patches validity either AllValid (for nullable arrays) or NonNullable. This microbenchmark doesn't reveal any clear improvements or degradations. It seems to me mostly noise. In theory, this change should make decompression a bit faster because validity is one place, but my primary goal here is to make ALP array simpler: validity is in one place, the encoded array. ### Benchmarks on latest commit: - PR: 7fb595b - develop: 0a18498 parameter is: (number of elements, fraction patched, fraction valid). Any ratio greater than 1.1 or less than 0.9 has a ` ***` ``` alp_compress │ PR median │ develop median │ ratio ├─ compress_alp │ │ │ │ ├─ f32 │ │ │ │ │ ├─ (100000, 0.0, 0.25) │ 160.4 µs │ 159.6 µs │ 1.0050 │ │ ├─ (100000, 0.0, 0.95) │ 145.9 µs │ 143.8 µs │ 1.0146 │ │ ├─ (100000, 0.0, 1.0) │ 137.0 µs │ 135.5 µs │ 1.0110 │ │ ├─ (100000, 0.01, 0.25) │ 227.7 µs │ 230.7 µs │ 0.9869 │ │ ├─ (100000, 0.01, 0.95) │ 227.9 µs │ 227.2 µs │ 1.0030 │ │ ├─ (100000, 0.01, 1.0) │ 226.6 µs │ 227.5 µs │ 0.9960 │ │ ├─ (100000, 0.1, 0.25) │ 238.3 µs │ 248.9 µs │ 0.9574 │ │ ├─ (100000, 0.1, 0.95) │ 238.2 µs │ 269.8 µs │ 0.8828 *** │ │ ├─ (100000, 0.1, 1.0) │ 230.6 µs │ 231.9 µs │ 0.9943 │ │ ├─ (10000000, 0.0, 0.25) │ 14.17 ms │ 13.77 ms │ 1.0290 │ │ ├─ (10000000, 0.0, 0.95) │ 14.16 ms │ 13.8 ms │ 1.0260 │ │ ├─ (10000000, 0.0, 1.0) │ 14.0 ms │ 12.47 ms │ 1.1226 *** │ │ ├─ (10000000, 0.01, 0.25) │ 22.29 ms │ 23.13 ms │ 0.9636 │ │ ├─ (10000000, 0.01, 0.95) │ 22.26 ms │ 23.78 ms │ 0.9360 │ │ ├─ (10000000, 0.01, 1.0) │ 22.19 ms │ 21.79 ms │ 1.0183 │ │ ├─ (10000000, 0.1, 0.25) │ 23.31 ms │ 27.72 ms │ 0.8409 *** │ │ ├─ (10000000, 0.1, 0.95) │ 23.4 ms │ 27.47 ms │ 0.8518 *** │ │ ╰─ (10000000, 0.1, 1.0) │ 22.99 ms │ 22.31 ms │ 1.0304 │ ╰─ f64 │ │ │ │ ├─ (100000, 0.0, 0.25) │ 165.2 µs │ 165.4 µs │ 0.9987 │ ├─ (100000, 0.0, 0.95) │ 166.1 µs │ 163.4 µs │ 1.0165 │ ├─ (100000, 0.0, 1.0) │ 164.7 µs │ 179.9 µs │ 0.9155 │ ├─ (100000, 0.01, 0.25) │ 269.7 µs │ 259.1 µs │ 1.0409 │ ├─ (100000, 0.01, 0.95) │ 270.5 µs │ 259.6 µs │ 1.0419 │ ├─ (100000, 0.01, 1.0) │ 268.9 µs │ 270.6 µs │ 0.9937 │ ├─ (100000, 0.1, 0.25) │ 281.7 µs │ 281.3 µs │ 1.0014 │ ├─ (100000, 0.1, 0.95) │ 279.1 µs │ 315.3 µs │ 0.8851 *** │ ├─ (100000, 0.1, 1.0) │ 273.0 µs │ 275.7 µs │ 0.9902 │ ├─ (10000000, 0.0, 0.25) │ 16.16 ms │ 15.86 ms │ 1.0189 │ ├─ (10000000, 0.0, 0.95) │ 16.19 ms │ 15.75 ms │ 1.0279 │ ├─ (10000000, 0.0, 1.0) │ 16.2 ms │ 15.83 ms │ 1.0233 │ ├─ (10000000, 0.01, 0.25) │ 25.29 ms │ 25.77 ms │ 0.9813 │ ├─ (10000000, 0.01, 0.95) │ 25.74 ms │ 25.94 ms │ 0.9922 │ ├─ (10000000, 0.01, 1.0) │ 25.54 ms │ 25.32 ms │ 1.0086 │ ├─ (10000000, 0.1, 0.25) │ 26.89 ms │ 30.73 ms │ 0.8750 *** │ ├─ (10000000, 0.1, 0.95) │ 27.05 ms │ 30.53 ms │ 0.8860 *** │ ╰─ (10000000, 0.1, 1.0) │ 26.22 ms │ 25.98 ms │ 1.0092 ├─ decompress_alp │ │ │ │ ├─ f32 │ │ │ │ │ ├─ (100000, 0.0, 0.25) │ 12.24 µs │ 12.33 µs │ 0.9927 │ │ ├─ (100000, 0.0, 0.95) │ 12.24 µs │ 12.16 µs │ 1.0065 │ │ ├─ (100000, 0.0, 1.0) │ 12.2 µs │ 12.16 µs │ 1.0032 │ │ ├─ (100000, 0.01, 0.25) │ 15.12 µs │ 14.04 µs │ 1.0769 │ │ ├─ (100000, 0.01, 0.95) │ 14.95 µs │ 14.81 µs │ 1.0094 │ │ ├─ (100000, 0.01, 1.0) │ 13.43 µs │ 13.24 µs │ 1.0143 │ │ ├─ (100000, 0.1, 0.25) │ 26.08 µs │ 17.41 µs │ 1.4979 *** │ │ ├─ (100000, 0.1, 0.95) │ 25.87 µs │ 25.04 µs │ 1.0331 │ │ ├─ (100000, 0.1, 1.0) │ 19.33 µs │ 21.08 µs │ 0.9169 │ │ ├─ (10000000, 0.0, 0.25) │ 2.067 ms │ 2.057 ms │ 1.0048 │ │ ├─ (10000000, 0.0, 0.95) │ 2.068 ms │ 2.055 ms │ 1.0063 │ │ ├─ (10000000, 0.0, 1.0) │ 2.07 ms │ 1.261 ms │ 1.6415 *** │ │ ├─ (10000000, 0.01, 0.25) │ 1.51 ms │ 2.113 ms │ 0.7146 *** │ │ ├─ (10000000, 0.01, 0.95) │ 1.477 ms │ 2.621 ms │ 0.5635 *** │ │ ├─ (10000000, 0.01, 1.0) │ 1.35 ms │ 1.346 ms │ 1.0029 │ │ ├─ (10000000, 0.1, 0.25) │ 3.765 ms │ 2.58 ms │ 1.4593 *** │ │ ├─ (10000000, 0.1, 0.95) │ 2.784 ms │ 3.28 ms │ 0.8487 *** │ │ ╰─ (10000000, 0.1, 1.0) │ 1.764 ms │ 1.754 ms │ 1.0057 │ ╰─ f64 │ │ │ │ ├─ (100000, 0.0, 0.25) │ 23.33 µs │ 23.45 µs │ 0.9948 │ ├─ (100000, 0.0, 0.95) │ 23.41 µs │ 23.33 µs │ 1.0034 │ ├─ (100000, 0.0, 1.0) │ 23.33 µs │ 23.49 µs │ 0.9931 │ ├─ (100000, 0.01, 0.25) │ 25.58 µs │ 24.66 µs │ 1.0373 │ ├─ (100000, 0.01, 0.95) │ 25.58 µs │ 25.79 µs │ 0.9918 │ ├─ (100000, 0.01, 1.0) │ 24.2 µs │ 24.62 µs │ 0.9829 │ ├─ (100000, 0.1, 0.25) │ 39.83 µs │ 27.87 µs │ 1.4291 *** │ ├─ (100000, 0.1, 0.95) │ 39.7 µs │ 39.56 µs │ 1.0035 │ ├─ (100000, 0.1, 1.0) │ 34.43 µs │ 31.66 µs │ 1.0874 │ ├─ (10000000, 0.0, 0.25) │ 4.246 ms │ 4.239 ms │ 1.0016 │ ├─ (10000000, 0.0, 0.95) │ 4.227 ms │ 4.292 ms │ 0.9848 │ ├─ (10000000, 0.0, 1.0) │ 4.227 ms │ 4.246 ms │ 0.9955 │ ├─ (10000000, 0.01, 0.25) │ 4.696 ms │ 4.356 ms │ 1.0780 │ ├─ (10000000, 0.01, 0.95) │ 4.933 ms │ 4.637 ms │ 1.0638 │ ├─ (10000000, 0.01, 1.0) │ 4.538 ms │ 4.545 ms │ 0.9984 │ ├─ (10000000, 0.1, 0.25) │ 7.23 ms │ 5.304 ms │ 1.3631 *** │ ├─ (10000000, 0.1, 0.95) │ 6.227 ms │ 5.913 ms │ 1.0531 │ ╰─ (10000000, 0.1, 1.0) │ 5.207 ms │ 5.29 ms │ 0.9843 ``` ### Benchmarks before reverting to develop's chunking code <details> [1] Seems like this PR is about the same except for compressing really large f64 arrays. The PR that introduced chunking, #924, reported substantially larger reductions (~5ms of 29ms) in time than this increase of ~1ms (of 17ms). ``` alp_compress │ PR median │ PR mean │ develop median │ develop mean │ ├─ compress_alp │ │ │ │ │ │ ├─ f32 │ │ │ │ │ │ │ ├─ (100000, 0.25) │ 136.4 µs │ 137.9 µs │ 143 µs │ 145.9 µs │ │ │ ├─ (100000, 0.95) │ 136.3 µs │ 137.1 µs │ 133.1 µs │ 134.3 µs │ │ │ ├─ (100000, 1.0) │ 136 µs │ 137.3 µs │ 133.6 µs │ 134.6 µs │ │ │ ├─ (10000000, 0.25) │ 13.54 ms │ 13.67 ms │ 13.74 ms │ 13.84 ms │ │ │ ├─ (10000000, 0.95) │ 13.54 ms │ 13.64 ms │ 13.49 ms │ 13.59 ms │ │ │ ╰─ (10000000, 1.0) │ 13.47 ms │ 13.57 ms │ 13.58 ms │ 13.73 ms │ │ ╰─ f64 │ │ │ │ │ │ ├─ (100000, 0.25) │ 152.5 µs │ 153.9 µs │ 166.1 µs │ 167.2 µs │ │ ├─ (100000, 0.95) │ 152.5 µs │ 154.3 µs │ 166.4 µs │ 167 µs │ │ ├─ (100000, 1.0) │ 151.5 µs │ 153 µs │ 166.2 µs │ 166.9 µs │ │ ├─ (10000000, 0.25) │ 16.89 ms │ 17 ms │ 15.87 ms │ 15.91 ms │ │ ├─ (10000000, 0.95) │ 16.96 ms │ 17.19 ms │ 16.14 ms │ 16.12 ms │ │ ╰─ (10000000, 1.0) │ 16.93 ms │ 16.99 ms │ 16.15 ms │ 16.18 ms │ ╰─ decompress_alp │ │ │ │ │ ├─ f32 │ │ │ │ │ │ ├─ (100000, 0.25) │ 12.33 µs │ 12.4 µs │ 12.37 µs │ 12.55 µs │ │ ├─ (100000, 0.95) │ 11.99 µs │ 12.01 µs │ 12.45 µs │ 12.58 µs │ │ ├─ (100000, 1.0) │ 11.95 µs │ 11.98 µs │ 11.91 µs │ 11.96 µs │ │ ├─ (10000000, 0.25) │ 1.233 ms │ 1.24 ms │ 2.064 ms │ 2.088 ms │ │ ├─ (10000000, 0.95) │ 1.232 ms │ 1.235 ms │ 2.063 ms │ 2.094 ms │ │ ╰─ (10000000, 1.0) │ 1.233 ms │ 1.236 ms │ 2.061 ms │ 2.088 ms │ ╰─ f64 │ │ │ │ │ ├─ (100000, 0.25) │ 23.29 µs │ 23.46 µs │ 23.33 µs │ 23.4 µs │ ├─ (100000, 0.95) │ 22.87 µs │ 22.92 µs │ 22.99 µs │ 23.06 µs │ ├─ (100000, 1.0) │ 22.87 µs │ 23 µs │ 22.95 µs │ 23 µs │ ├─ (10000000, 0.25) │ 4.254 ms │ 4.393 ms │ 4.239 ms │ 4.28 ms │ ├─ (10000000, 0.95) │ 4.703 ms │ 4.639 ms │ 4.27 ms │ 4.437 ms │ ╰─ (10000000, 1.0) │ 4.479 ms │ 4.58 ms │ 4.684 ms │ 4.618 ms │ ``` </details>

lwwmanning added 9 commits September 24, 2024 17:41

improve ALP size estimation

2c0a148

fmt

bcd7761

fix test

ffd2b4d

branchless ALP

0aaedca

fixes

d250b7e

hopefully very fast

e666c12

fmt

470d51e

Merge remote-tracking branch 'origin/develop' into wm/alp

d45e256

works, but only 10% faster

43bee8d

lwwmanning changed the title ~~branchless ALP encode~~ faster ALP encode Sep 25, 2024

lwwmanning added 6 commits September 25, 2024 13:32

wip

b3f4d18

refactor

01a1e51

fix fill bug

954cbbf

Merge remote-tracking branch 'origin/develop' into wm/alp

6203672

fmt

c91b6aa

remove cruft

ff98c69

lwwmanning marked this pull request as ready for review September 25, 2024 14:35

AdamGS reviewed Sep 25, 2024

View reviewed changes

encodings/alp/src/alp.rs Outdated Show resolved Hide resolved

a10y reviewed Sep 25, 2024

View reviewed changes

encodings/alp/src/alp.rs Outdated Show resolved Hide resolved

CR feedback

d82e914

a10y approved these changes Sep 25, 2024

View reviewed changes

robert3005 approved these changes Sep 25, 2024

View reviewed changes

encodings/alp/src/alp.rs Show resolved Hide resolved

one more

614f02c

lwwmanning enabled auto-merge (squash) September 25, 2024 15:04

lwwmanning merged commit a7fd730 into develop Sep 25, 2024
5 checks passed

lwwmanning deleted the wm/branchless-alp branch September 25, 2024 15:17

lwwmanning commented Sep 25, 2024

View reviewed changes

lwwmanning mentioned this pull request Sep 26, 2024

fix: edge case in filling ALP encoded child on patches #939

Merged

danking mentioned this pull request Jan 30, 2025

feat: teach ALPArray to store validity only in the encoded array & other minor changes #2053

Closed

danking mentioned this pull request Feb 3, 2025

feat: teach ALPArray to store validity only in the encoded array #2216

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

faster ALP encode #924

faster ALP encode #924

lwwmanning commented Sep 25, 2024 •

edited

Loading

a10y left a comment

robert3005 left a comment

lwwmanning Sep 25, 2024

faster ALP encode #924

faster ALP encode #924

Conversation

lwwmanning commented Sep 25, 2024 • edited Loading

a10y left a comment

Choose a reason for hiding this comment

robert3005 left a comment

Choose a reason for hiding this comment

lwwmanning Sep 25, 2024

Choose a reason for hiding this comment

lwwmanning commented Sep 25, 2024 •

edited

Loading