Open
Description
Quantile Compression/PCodec is claiming 35%-71% better compression than zstd.
I've integrated the rust library into TurboPFor using the ffi bindings for comparison purpose.
We use the synthetic dataset provided in the Quantile Compression repository and other real data with large integers.
As real data with values larger than 32bits are not common, we use 32 bits integers when possible instead of 64 bits for all files. Note that some files can be better compressed by using delta or the integrated zigzag delta in conjunction with TurboTranspose. Download icapp, test with your own data and convince yourself.
- 32 bits integers:
Better compression and several times faster decompression with TurboTranspose+zstd
icapp i64*.txt -Ftu -e81 -Ezstd,22 size ratio E MB/s D MB/s function integer size=32 bits (lz=zstd,22) 450889 11.27% 19 6070 81:Lztp Byte Transpose +zstd,22 i64_cents.txt 182 0.0046% 239 28485 81:Lztp Byte Transpose +zstd,22 i64_constant.txt 631400 15.79% 5 5837 81:Lztp Byte Transpose +zstd,22 i64_dollars.txt 2693750 67.34% 12 4575 81:Lztp Byte Transpose +zstd,22 i64_geo1M.txt 251570 6.29% 14 7335 81:Lztp Byte Transpose +zstd,22 i64_geo2.txt 1028913 25.72% 12 2728 81:Lztp Byte Transpose +zstd,22 i64_interleaved.txt 1640375 41.01% 4 2774 81:Lztp Byte Transpose +zstd,22 i64_lomax15.txt 1592074 39.80% 12 3271 81:Lztp Byte Transpose +zstd,22 i64_lomax25.txt 2006291 50.16% 5 983 81:Lztp Byte Transpose +zstd,22 i64_misordered.txt 419053 10.48% 6 3915 81:Lztp Byte Transpose +zstd,22 i64_normal1.txt 815743 20.39% 8 3719 81:Lztp Byte Transpose +zstd,22 i64_normal10.txt 2898888 72.47% 6 3349 81:Lztp Byte Transpose +zstd,22 i64_normal1M.txt 404996 10.12% 5 3687 81:Lztp Byte Transpose +zstd,22 i64_slow_cosine.txt 16027 0.40% 8 20169 81:Lztp Byte Transpose +zstd,22 i64_sparse.txt 1411267 35.28% 5 2033 81:Lztp Byte Transpose +zstd,22 i64_total_cents.txt 16261417 Total icapp i64*.txt -Ftu -e173 size ratio E MB/s D MB/s function integer size=32 bits 450451 11.26% 189 431 173:qcomp quantile compress i64_cents.txt 44 0.0011% 744 549 173:qcomp quantile compress i64_constant.txt 620064 15.50% 159 448 173:qcomp quantile compress i64_dollars.txt 2676957 66.92% 76 324 173:qcomp quantile compress i64_geo1M.txt 250467 6.26% 212 571 173:qcomp quantile compress i64_geo2.txt 2253101 56.33% 92 400 173:qcomp quantile compress i64_interleaved.txt 1575073 39.38% 98 373 173:qcomp quantile compress i64_lomax15.txt 1545171 38.63% 102 398 173:qcomp quantile compress i64_lomax25.txt 2253103 56.33% 78 313 173:qcomp quantile compress i64_misordered.txt 282581 7.06% 233 451 173:qcomp quantile compress i64_normal1.txt 676116 16.90% 161 452 173:qcomp quantile compress i64_normal10.txt 2754383 68.86% 74 336 173:qcomp quantile compress i64_normal1M.txt 221218 5.53% 269 534 173:qcomp quantile compress i64_slow_cosine.txt 14323 0.36% 718 2614 173:qcomp quantile compress i64_sparse.txt 1158386 28.96% 95 229 173:qcomp quantile compress i64_total_cents.txt 16731437 Total
- Floating point (64 bits):
Quantile Compresion/PCodec is slightly better but decompression is a lot slower (2-3x) than zstd
icapp f64*.txt -Ftd -e80 -Ezstd,22 size ratio E MB/s D MB/s function floating point size=64 bits (lz=zstd,22) unsorted -1 2412121 30.15% 3 1621 80:Lz zstd,22 f64_decimal_long.txt 9111 37.96% 7 913 80:Lz zstd,22 f64_decimal_short.txt 4970116 62.13% 5 1377 80:Lz zstd,22 f64_edge_cases.txt 4247812 53.10% 3 729 80:Lz zstd,22 f64_integers.txt 7670370 95.88% 8 1348 80:Lz zstd,22 f64_normal_at_0.txt 6221137 77.76% 3 1212 80:Lz zstd,22 f64_normal_at_1000.txt 4073918 50.92% 6 992 80:Lz zstd,22 f64_slow_cosine.txt 29604584 Total icapp f64*.txt -Ftd -e173 size ratio E MB/s D MB/s function floating point size=64 bits 6686504 83.58% 189 605 173:qcomp quantile compress f64_decimal_long.txt 20134 83.89% 8 652 173:qcomp quantile compress f64_decimal_short.txt 4364570 54.56% 172 540 173:qcomp quantile compress f64_edge_cases.txt 3754251 46.93% 133 675 173:qcomp quantile compress f64_integers.txt 6943689 86.80% 131 518 173:qcomp quantile compress f64_normal_at_0.txt 5638910 70.49% 138 551 173:qcomp quantile compress f64_normal_at_1000.txt 1813077 22.66% 155 493 173:qcomp quantile compress f64_slow_cosine.txt 29221134 Total
- Timestamps (64 bits)
Quantile Compresion is slightly better but decompression is a lot slower (6x) than TurboTranspose+zstd
icapp micro*.* -FtT -e173 size ratio E MB/s D MB/s function integer size=64 bits 2497182 31.21% 140 640 173:qcomp quantile compress micros_millis.txt.ts 3742368 46.78% 195 793 173:qcomp quantile compress micros_near_linear.txt.ts 6239549 icapp micro*.* -FtT -e81 -Ezstd,22 size ratio E MB/s D MB/s function integer size=64 bits 3385201 42.32% 16 4089 81:Lztp Byte Transpose +zstd,22 micros_millis.txt.ts 2800155 35.00% 21 3367 81:Lztp Byte Transpose +zstd,22 micros_near_linear.txt.ts 6185355 Total
- Non synthetic dataset + lz77 offsets output. test1_demo (text) + test3_demo(binary). These are typical data for mixed small, medium and large integers.
As iccodec we use "zstd,15" and TurboVLC+"turborc,56" (only entropy coding w/ adaptive Asymmetric Numeral System)
Quantile compression is not competitive and the decompression is several (7 - 60) times slower.
TurboVLC+rANS compress better and compress/decompress faster.
icapp -Ezstd,15 CCNEWS-RLZ-D64-FLENS.txt -Ftu -e81,96,80,173,3 size ratio E MB/s D MB/s function integer size=32 bits 22145289 5.54% 29 3525 81:Lztp Byte Transpose +zstd,15 23693811 5.92% 32 2743 96:vlccomp TurboVLC +zstd,15 29382157 7.35% 9 3536 80:Lz zstd,15 59957497 14.99% 367 692 96:vlccomp TurboVLC +turborc,56 (=rANS) 62529619 15.63% 164 345 173:qcomp quantile compress 77585707 19.40% 1820 11324 3:p4nenc256v32 TurboPFor256 icapp -Ezstd,15 CCNEWS-RLZ-D64-FOFFSETS.txt -Ftu -e81,96,80,173,3 93751603 23.44% 19 2745 80:Lz zstd,15 283069853 70.77% 56 2622 96:vlccomp TurboVLC +zstd,15 322425616 80.61% 338 651 96:vlccomp TurboVLC +turborc,56 (=rANS) 323345103 80.84% 73 219 173:qcomp quantile compress 325331435 81.33% 2444 10740 3:p4nenc256v32 TurboPFor256 icapp -Ezstd,15 news-docs.2016-WORD.txt -Ftu -e81,96,80,173,3 142677882 35.67% 4 1444 80:Lz zstd,15 145450083 36.36% 37 1546 96:vlccomp TurboVLC +zstd,15 148119568 37.03% 11 1550 81:Lztp Byte Transpose +zstd,15 151616778 37.90% 313 605 96:vlccomp TurboVLC +turborc,56 189513565 47.38% 82 212 173:qcomp quantile compress 181946393 45.49% 1580 7641 3:p4nenc256v32 TurboPFor256 icapp -Ezstd,15 news-docs.2016-WORD-BWTMTF.txt -Ftu -e81,96,80,173,3 103706209 25.93% 306 558 96:vlccomp TurboVLC +turborc,56 105855336 26.46% 127 303 173:qcomp quantile compress 105872416 26.47% 29 1251 96:vlccomp TurboVLC +zstd,15 116101605 29.03% 11 1745 81:Lztp Byte Transpose +zstd,15 136893715 34.22% 4 1319 80:Lz zstd,15 135115053 33.78% 1561 9292 3:p4nenc256v32 TurboPFor256 icapp -Ezstd,15 test1_demo_o.u32 -e81,96,80,173,3 71858387 65.93% 332 650 96:vlccomp TurboVLC +turborc,56 72044472 66.10% 214 2036 96:vlccomp TurboVLC +zstd,15 72142814 66.19% 74 242 173:qcomp quantile compress 77852927 71.43% 7 1324 81:Lztp Byte Transpose +zstd,15 78282925 71.82% 1364 8745 3:p4nenc256v32 TurboPFor256 84333237 77.37% 6 1007 80:Lz zstd,15 icapp -Ezstd,15 test3_demo_o.u32 -e81,96,80,173,3 15946736 34.18% 11 13588 81:Lztp Byte Transpose +zstd,15 16182167 34.68% 8 1293 80:Lz zstd,15 17707120 37.95% 41 1807 96:vlccomp TurboVLC +zstd,15 17734852 38.01% 321 637 96:vlccomp TurboVLC +turborc,56 20344975 43.60% 97 226 173:qcomp quantile compress 22870847 49.01% 1500 8905 3:p4nenc256v32 TurboPFor256
Metadata
Metadata
Assignees
Labels
No labels