feat: teach ALPArray to store validity only in the encoded array #2216

danking · 2025-02-03T18:51:56Z

This PR trims invalid values from the patches and makes the patches validity either AllValid (for nullable arrays) or NonNullable.

This microbenchmark doesn't reveal any clear improvements or degradations. It seems to me mostly noise. In theory, this change should make decompression a bit faster because validity is one place, but my primary goal here is to make ALP array simpler: validity is in one place, the encoded array.

Benchmarks on latest commit:

PR: 7fb595b
develop: 0a18498

parameter is: (number of elements, fraction patched, fraction valid).

Any ratio greater than 1.1 or less than 0.9 has a ***

alp_compress                    │ PR median     │ develop median │ ratio
├─ compress_alp                 │               │                │
│  ├─ f32                       │               │                │
│  │  ├─ (100000, 0.0, 0.25)    │ 160.4 µs      │ 159.6 µs       │ 1.0050
│  │  ├─ (100000, 0.0, 0.95)    │ 145.9 µs      │ 143.8 µs       │ 1.0146
│  │  ├─ (100000, 0.0, 1.0)     │ 137.0 µs      │ 135.5 µs       │ 1.0110
│  │  ├─ (100000, 0.01, 0.25)   │ 227.7 µs      │ 230.7 µs       │ 0.9869
│  │  ├─ (100000, 0.01, 0.95)   │ 227.9 µs      │ 227.2 µs       │ 1.0030
│  │  ├─ (100000, 0.01, 1.0)    │ 226.6 µs      │ 227.5 µs       │ 0.9960
│  │  ├─ (100000, 0.1, 0.25)    │ 238.3 µs      │ 248.9 µs       │ 0.9574
│  │  ├─ (100000, 0.1, 0.95)    │ 238.2 µs      │ 269.8 µs       │ 0.8828  ***
│  │  ├─ (100000, 0.1, 1.0)     │ 230.6 µs      │ 231.9 µs       │ 0.9943
│  │  ├─ (10000000, 0.0, 0.25)  │ 14.17 ms      │ 13.77 ms       │ 1.0290
│  │  ├─ (10000000, 0.0, 0.95)  │ 14.16 ms      │ 13.8 ms        │ 1.0260
│  │  ├─ (10000000, 0.0, 1.0)   │ 14.0 ms       │ 12.47 ms       │ 1.1226  ***
│  │  ├─ (10000000, 0.01, 0.25) │ 22.29 ms      │ 23.13 ms       │ 0.9636
│  │  ├─ (10000000, 0.01, 0.95) │ 22.26 ms      │ 23.78 ms       │ 0.9360
│  │  ├─ (10000000, 0.01, 1.0)  │ 22.19 ms      │ 21.79 ms       │ 1.0183
│  │  ├─ (10000000, 0.1, 0.25)  │ 23.31 ms      │ 27.72 ms       │ 0.8409  ***
│  │  ├─ (10000000, 0.1, 0.95)  │ 23.4 ms       │ 27.47 ms       │ 0.8518  ***
│  │  ╰─ (10000000, 0.1, 1.0)   │ 22.99 ms      │ 22.31 ms       │ 1.0304
│  ╰─ f64                       │               │                │
│     ├─ (100000, 0.0, 0.25)    │ 165.2 µs      │ 165.4 µs       │ 0.9987
│     ├─ (100000, 0.0, 0.95)    │ 166.1 µs      │ 163.4 µs       │ 1.0165
│     ├─ (100000, 0.0, 1.0)     │ 164.7 µs      │ 179.9 µs       │ 0.9155
│     ├─ (100000, 0.01, 0.25)   │ 269.7 µs      │ 259.1 µs       │ 1.0409
│     ├─ (100000, 0.01, 0.95)   │ 270.5 µs      │ 259.6 µs       │ 1.0419
│     ├─ (100000, 0.01, 1.0)    │ 268.9 µs      │ 270.6 µs       │ 0.9937
│     ├─ (100000, 0.1, 0.25)    │ 281.7 µs      │ 281.3 µs       │ 1.0014
│     ├─ (100000, 0.1, 0.95)    │ 279.1 µs      │ 315.3 µs       │ 0.8851  ***
│     ├─ (100000, 0.1, 1.0)     │ 273.0 µs      │ 275.7 µs       │ 0.9902
│     ├─ (10000000, 0.0, 0.25)  │ 16.16 ms      │ 15.86 ms       │ 1.0189
│     ├─ (10000000, 0.0, 0.95)  │ 16.19 ms      │ 15.75 ms       │ 1.0279
│     ├─ (10000000, 0.0, 1.0)   │ 16.2 ms       │ 15.83 ms       │ 1.0233
│     ├─ (10000000, 0.01, 0.25) │ 25.29 ms      │ 25.77 ms       │ 0.9813
│     ├─ (10000000, 0.01, 0.95) │ 25.74 ms      │ 25.94 ms       │ 0.9922
│     ├─ (10000000, 0.01, 1.0)  │ 25.54 ms      │ 25.32 ms       │ 1.0086
│     ├─ (10000000, 0.1, 0.25)  │ 26.89 ms      │ 30.73 ms       │ 0.8750  ***
│     ├─ (10000000, 0.1, 0.95)  │ 27.05 ms      │ 30.53 ms       │ 0.8860  ***
│     ╰─ (10000000, 0.1, 1.0)   │ 26.22 ms      │ 25.98 ms       │ 1.0092
├─ decompress_alp               │               │                │
│  ├─ f32                       │               │                │
│  │  ├─ (100000, 0.0, 0.25)    │ 12.24 µs      │ 12.33 µs       │ 0.9927
│  │  ├─ (100000, 0.0, 0.95)    │ 12.24 µs      │ 12.16 µs       │ 1.0065
│  │  ├─ (100000, 0.0, 1.0)     │ 12.2 µs       │ 12.16 µs       │ 1.0032
│  │  ├─ (100000, 0.01, 0.25)   │ 15.12 µs      │ 14.04 µs       │ 1.0769
│  │  ├─ (100000, 0.01, 0.95)   │ 14.95 µs      │ 14.81 µs       │ 1.0094
│  │  ├─ (100000, 0.01, 1.0)    │ 13.43 µs      │ 13.24 µs       │ 1.0143
│  │  ├─ (100000, 0.1, 0.25)    │ 26.08 µs      │ 17.41 µs       │ 1.4979  ***
│  │  ├─ (100000, 0.1, 0.95)    │ 25.87 µs      │ 25.04 µs       │ 1.0331
│  │  ├─ (100000, 0.1, 1.0)     │ 19.33 µs      │ 21.08 µs       │ 0.9169
│  │  ├─ (10000000, 0.0, 0.25)  │ 2.067 ms      │ 2.057 ms       │ 1.0048
│  │  ├─ (10000000, 0.0, 0.95)  │ 2.068 ms      │ 2.055 ms       │ 1.0063
│  │  ├─ (10000000, 0.0, 1.0)   │ 2.07 ms       │ 1.261 ms       │ 1.6415  ***
│  │  ├─ (10000000, 0.01, 0.25) │ 1.51 ms       │ 2.113 ms       │ 0.7146  ***
│  │  ├─ (10000000, 0.01, 0.95) │ 1.477 ms      │ 2.621 ms       │ 0.5635  ***
│  │  ├─ (10000000, 0.01, 1.0)  │ 1.35 ms       │ 1.346 ms       │ 1.0029
│  │  ├─ (10000000, 0.1, 0.25)  │ 3.765 ms      │ 2.58 ms        │ 1.4593  ***
│  │  ├─ (10000000, 0.1, 0.95)  │ 2.784 ms      │ 3.28 ms        │ 0.8487  ***
│  │  ╰─ (10000000, 0.1, 1.0)   │ 1.764 ms      │ 1.754 ms       │ 1.0057
│  ╰─ f64                       │               │                │
│     ├─ (100000, 0.0, 0.25)    │ 23.33 µs      │ 23.45 µs       │ 0.9948
│     ├─ (100000, 0.0, 0.95)    │ 23.41 µs      │ 23.33 µs       │ 1.0034
│     ├─ (100000, 0.0, 1.0)     │ 23.33 µs      │ 23.49 µs       │ 0.9931
│     ├─ (100000, 0.01, 0.25)   │ 25.58 µs      │ 24.66 µs       │ 1.0373
│     ├─ (100000, 0.01, 0.95)   │ 25.58 µs      │ 25.79 µs       │ 0.9918
│     ├─ (100000, 0.01, 1.0)    │ 24.2 µs       │ 24.62 µs       │ 0.9829
│     ├─ (100000, 0.1, 0.25)    │ 39.83 µs      │ 27.87 µs       │ 1.4291  ***
│     ├─ (100000, 0.1, 0.95)    │ 39.7 µs       │ 39.56 µs       │ 1.0035
│     ├─ (100000, 0.1, 1.0)     │ 34.43 µs      │ 31.66 µs       │ 1.0874
│     ├─ (10000000, 0.0, 0.25)  │ 4.246 ms      │ 4.239 ms       │ 1.0016
│     ├─ (10000000, 0.0, 0.95)  │ 4.227 ms      │ 4.292 ms       │ 0.9848
│     ├─ (10000000, 0.0, 1.0)   │ 4.227 ms      │ 4.246 ms       │ 0.9955
│     ├─ (10000000, 0.01, 0.25) │ 4.696 ms      │ 4.356 ms       │ 1.0780
│     ├─ (10000000, 0.01, 0.95) │ 4.933 ms      │ 4.637 ms       │ 1.0638
│     ├─ (10000000, 0.01, 1.0)  │ 4.538 ms      │ 4.545 ms       │ 0.9984
│     ├─ (10000000, 0.1, 0.25)  │ 7.23 ms       │ 5.304 ms       │ 1.3631  ***
│     ├─ (10000000, 0.1, 0.95)  │ 6.227 ms      │ 5.913 ms       │ 1.0531
│     ╰─ (10000000, 0.1, 1.0)   │ 5.207 ms      │ 5.29 ms        │ 0.9843

Benchmarks before reverting to develop's chunking code

[1] Seems like this PR is about the same except for compressing really large f64 arrays. The PR that introduced chunking, #924, reported substantially larger reductions (~5ms of 29ms) in time than this increase of ~1ms (of 17ms).

alp_compress               │ PR median     │ PR mean   │ develop median │ develop mean │
├─ compress_alp            │               │           │                │              │
│  ├─ f32                  │               │           │                │              │
│  │  ├─ (100000, 0.25)    │ 136.4 µs      │ 137.9 µs  │ 143 µs         │ 145.9 µs     │
│  │  ├─ (100000, 0.95)    │ 136.3 µs      │ 137.1 µs  │ 133.1 µs       │ 134.3 µs     │
│  │  ├─ (100000, 1.0)     │ 136 µs        │ 137.3 µs  │ 133.6 µs       │ 134.6 µs     │
│  │  ├─ (10000000, 0.25)  │ 13.54 ms      │ 13.67 ms  │ 13.74 ms       │ 13.84 ms     │
│  │  ├─ (10000000, 0.95)  │ 13.54 ms      │ 13.64 ms  │ 13.49 ms       │ 13.59 ms     │
│  │  ╰─ (10000000, 1.0)   │ 13.47 ms      │ 13.57 ms  │ 13.58 ms       │ 13.73 ms     │
│  ╰─ f64                  │               │           │                │              │
│     ├─ (100000, 0.25)    │ 152.5 µs      │ 153.9 µs  │ 166.1 µs       │ 167.2 µs     │
│     ├─ (100000, 0.95)    │ 152.5 µs      │ 154.3 µs  │ 166.4 µs       │ 167 µs       │
│     ├─ (100000, 1.0)     │ 151.5 µs      │ 153 µs    │ 166.2 µs       │ 166.9 µs     │
│     ├─ (10000000, 0.25)  │ 16.89 ms      │ 17 ms     │ 15.87 ms       │ 15.91 ms     │
│     ├─ (10000000, 0.95)  │ 16.96 ms      │ 17.19 ms  │ 16.14 ms       │ 16.12 ms     │
│     ╰─ (10000000, 1.0)   │ 16.93 ms      │ 16.99 ms  │ 16.15 ms       │ 16.18 ms     │
╰─ decompress_alp          │               │           │                │              │
   ├─ f32                  │               │           │                │              │
   │  ├─ (100000, 0.25)    │ 12.33 µs      │ 12.4 µs   │ 12.37 µs       │ 12.55 µs     │
   │  ├─ (100000, 0.95)    │ 11.99 µs      │ 12.01 µs  │ 12.45 µs       │ 12.58 µs     │
   │  ├─ (100000, 1.0)     │ 11.95 µs      │ 11.98 µs  │ 11.91 µs       │ 11.96 µs     │
   │  ├─ (10000000, 0.25)  │ 1.233 ms      │ 1.24 ms   │ 2.064 ms       │ 2.088 ms     │
   │  ├─ (10000000, 0.95)  │ 1.232 ms      │ 1.235 ms  │ 2.063 ms       │ 2.094 ms     │
   │  ╰─ (10000000, 1.0)   │ 1.233 ms      │ 1.236 ms  │ 2.061 ms       │ 2.088 ms     │
   ╰─ f64                  │               │           │                │              │
      ├─ (100000, 0.25)    │ 23.29 µs      │ 23.46 µs  │ 23.33 µs       │ 23.4 µs      │
      ├─ (100000, 0.95)    │ 22.87 µs      │ 22.92 µs  │ 22.99 µs       │ 23.06 µs     │
      ├─ (100000, 1.0)     │ 22.87 µs      │ 23 µs     │ 22.95 µs       │ 23 µs        │
      ├─ (10000000, 0.25)  │ 4.254 ms      │ 4.393 ms  │ 4.239 ms       │ 4.28 ms      │
      ├─ (10000000, 0.95)  │ 4.703 ms      │ 4.639 ms  │ 4.27 ms        │ 4.437 ms     │
      ╰─ (10000000, 1.0)   │ 4.479 ms      │ 4.58 ms   │ 4.684 ms       │ 4.618 ms     │

The patches are now always non-nullable. This required PrimitiveArray::patch to gracefully handle non-nullable patches when the array is nullable. I modified the benchmarks to include patch manipulation time, but notice that the test data has no patches. The benchmarks measure the overhead of `is_valid`. If we had test data where the invalid positions contained exceptional values, I would expect a modest improvement in both decompression and compression time.

This reverts commit f26139f.

finish revert

gatesn · 2025-02-03T19:34:14Z

encodings/alp/src/alp/array.rs

+                vortex_bail!(MismatchedTypes: dtype, patches.dtype());
+            }
+
+            if patches.values().validity_mask()?.false_count() != 0 {


Calling validity_mask here triggers a "canonicalization" of the validity buffer. You should instead use patches.values().all_valid()? which should short-circuit if possible

Done, thanks.

gatesn · 2025-02-03T19:34:28Z

encodings/alp/Cargo.toml

@@ -17,6 +17,7 @@ readme = { workspace = true }
 workspace = true

 [dependencies]
+arrow-array = { workspace = true }


This is suspect?

Indeed cruft. Removed.

gatesn · 2025-02-03T19:36:29Z

encodings/alp/src/alp/compress.rs

+    // exceptional_positions may contain exceptions at invalid positions (which contain garbage
+    // data). We remove invalid exceptional positions in order to keep the Patches small.
+    let (valid_exceptional_positions, valid_exceptional_values): (Buffer<u64>, Buffer<T>) =
+        if n_valid == 0 {


I think you can do match validity.boolean_buffer() to switch over alltrue / allfalse and a buffer

gatesn · 2025-02-03T21:38:48Z

encodings/alp/src/alp/compress.rs

        );
        assert_eq!(encoded.exponents(), Exponents { e: 16, f: 13 });

        let decoded = decompress(encoded).unwrap();
        assert_eq!(values.as_slice(), decoded.as_slice::<f64>());
    }

+    #[test]
+    #[allow(clippy::approx_constant)] // Clippy objects to 2.718, an approximation of e, the base of the natural logarithm.


That's so funny

danking added 18 commits February 3, 2025 12:44

irrelevant comment

0cbf5fd

unnecessary condition

1a85002

use values_slice instead of calling as_slice again

c63f641

fix

02e08c3

Revert "unnecessary condition"

677c2e3

This reverts commit f26139f.

restore fill values

866e8fa

fix tests

2ad7fff

remove fill null zero

933043c

clippy

89e7c5c

restore test for all null

46a3693

fix the null round trip test

df608e8

sort fraction_valid

bb57ac0

final fixes

02b3f53

use zero as fill value if the entire chunk is patches

0b9697b

revert mod

d93ac13

finish revert

include some patch values in test

210c5e0

include patches

7fb595b

danking requested review from a10y and gatesn February 3, 2025 18:52

danking marked this pull request as ready for review February 3, 2025 18:56

gatesn requested changes Feb 3, 2025

View reviewed changes

address comments

0537de7

danking requested a review from gatesn February 3, 2025 21:05

gatesn reviewed Feb 3, 2025

View reviewed changes

gatesn approved these changes Feb 3, 2025

View reviewed changes

danking merged commit 2bf84b0 into develop Feb 3, 2025
21 checks passed

danking deleted the dk/alp-validity-in-encoded-only-v2 branch February 3, 2025 23:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: teach ALPArray to store validity only in the encoded array #2216

feat: teach ALPArray to store validity only in the encoded array #2216

danking commented Feb 3, 2025 •

edited

Loading

gatesn Feb 3, 2025

danking Feb 3, 2025

gatesn Feb 3, 2025

danking Feb 3, 2025

gatesn Feb 3, 2025

danking Feb 3, 2025

gatesn Feb 3, 2025

feat: teach ALPArray to store validity only in the encoded array #2216

feat: teach ALPArray to store validity only in the encoded array #2216

Conversation

danking commented Feb 3, 2025 • edited Loading

Benchmarks on latest commit:

Benchmarks before reverting to develop's chunking code

gatesn Feb 3, 2025

Choose a reason for hiding this comment

danking Feb 3, 2025

Choose a reason for hiding this comment

gatesn Feb 3, 2025

Choose a reason for hiding this comment

danking Feb 3, 2025

Choose a reason for hiding this comment

gatesn Feb 3, 2025

Choose a reason for hiding this comment

danking Feb 3, 2025

Choose a reason for hiding this comment

gatesn Feb 3, 2025

Choose a reason for hiding this comment

danking commented Feb 3, 2025 •

edited

Loading