Skip to content

Conversation

@brian-pane
Copy link

No description provided.

@brian-pane
Copy link
Author

Before and after:

Benchmark 1 (69 runs): ./blogpost-compress-baseline 1 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          73.2ms ± 1.03ms    72.3ms … 79.6ms          4 ( 6%)        0%
  peak_rss           26.6MB ± 63.9KB    26.4MB … 26.7MB          1 ( 1%)        0%
  cpu_cycles          282M  ± 1.17M      280M  …  289M           1 ( 1%)        0%
  instructions        549M  ±  441       549M  …  549M           1 ( 1%)        0%
  cache_references    266K  ± 11.4K      261K  …  331K           5 ( 7%)        0%
  cache_misses        232K  ± 8.08K      206K  …  244K           8 (12%)        0%
  branch_misses      2.88M  ± 8.15K     2.87M  … 2.91M           3 ( 4%)        0%
Benchmark 2 (72 runs): ./target/release/examples/blogpost-compress 1 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          69.7ms ±  440us    69.0ms … 70.8ms          2 ( 3%)        ⚡-  4.7% ±  0.4%
  peak_rss           26.6MB ± 67.4KB    26.5MB … 26.7MB          0 ( 0%)          -  0.0% ±  0.1%
  cpu_cycles          267M  ±  720K      265M  …  269M           1 ( 1%)        ⚡-  5.2% ±  0.1%
  instructions        522M  ±  233       522M  …  522M           1 ( 1%)        ⚡-  4.9% ±  0.0%
  cache_references    266K  ± 7.48K      261K  …  313K          11 (15%)          -  0.3% ±  1.2%
  cache_misses        231K  ± 8.70K      205K  …  240K           8 (11%)          -  0.3% ±  1.2%
  branch_misses      2.85M  ± 10.9K     2.83M  … 2.87M           0 ( 0%)          -  1.1% ±  0.1%

@codecov
Copy link

codecov bot commented May 28, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Flag Coverage Δ
fuzz-compress ?
fuzz-decompress ?
test-aarch64-apple-darwin 93.38% <100.00%> (-0.01%) ⬇️
test-x86_64-apple-darwin 91.69% <100.00%> (+<0.01%) ⬆️
test-x86_64-unknown-linux-gnu 90.47% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
zlib-rs/src/deflate/algorithm/quick.rs 96.77% <100.00%> (-0.91%) ⬇️
zlib-rs/src/deflate/hash_calc.rs 100.00% <100.00%> (ø)

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@brian-pane
Copy link
Author

This is an idea I've been thinking about for a while. It's possible, but complicated, to do this with the higher compression levels, so I started with the simpler deflate_quick to try out the concept.

Copy link
Member

@folkertdev folkertdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, that's awesome. Left a comment below to mitigate the effects on other compression levels (they all use the same hashing function, so it's a good idea to check them too)

Self::quick_insert_value(state, string, val)
}

pub fn quick_insert_value(state: &mut State, string: usize, val: u32) -> u16 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for me, this needs an #[inline], otherwise level 2 regresses by a couple of percent. Interestingly I then see some improvements for the other compression levels too (mostly in instructions), maybe because different (better) inlining decisions are made.

@brian-pane
Copy link
Author

With the #[inline] added, the results on my test system are:

Benchmark 1 (69 runs): ./blogpost-compress-baseline 1 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          73.3ms ± 1.77ms    72.1ms … 86.9ms          2 ( 3%)        0%
  peak_rss           26.6MB ± 68.7KB    26.5MB … 26.8MB          0 ( 0%)        0%
  cpu_cycles          282M  ±  747K      280M  …  283M           3 ( 4%)        0%
  instructions        549M  ±  273       549M  …  549M           0 ( 0%)        0%
  cache_references    265K  ± 6.58K      260K  …  311K           3 ( 4%)        0%
  cache_misses        230K  ± 8.12K      195K  …  247K           6 ( 9%)        0%
  branch_misses      2.88M  ± 9.40K     2.86M  … 2.91M           1 ( 1%)        0%
Benchmark 2 (72 runs): ./target/release/examples/blogpost-compress 1 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          69.8ms ±  568us    69.1ms … 71.5ms          4 ( 6%)        ⚡-  4.7% ±  0.6%
  peak_rss           26.6MB ± 64.0KB    26.5MB … 26.7MB          0 ( 0%)          -  0.0% ±  0.1%
  cpu_cycles          267M  ±  663K      266M  …  271M           1 ( 1%)        ⚡-  5.1% ±  0.1%
  instructions        524M  ±  294       524M  …  524M           0 ( 0%)        ⚡-  4.6% ±  0.0%
  cache_references    265K  ± 3.94K      260K  …  281K           6 ( 8%)          -  0.0% ±  0.7%
  cache_misses        230K  ± 8.40K      191K  …  241K           7 (10%)          +  0.1% ±  1.2%
  branch_misses      2.85M  ± 4.86K     2.83M  … 2.86M           1 ( 1%)        ⚡-  1.3% ±  0.1%
Benchmark 1 (42 runs): ./blogpost-compress-baseline 2 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           120ms ± 1.06ms     119ms …  124ms          3 ( 7%)        0%
  peak_rss           24.9MB ± 57.4KB    24.8MB … 25.0MB          0 ( 0%)        0%
  cpu_cycles          486M  ± 2.99M      484M  …  504M           1 ( 2%)        0%
  instructions       1.07G  ±  283      1.07G  … 1.07G           1 ( 2%)        0%
  cache_references    269K  ± 5.36K      264K  …  291K           3 ( 7%)        0%
  cache_misses        233K  ± 7.27K      209K  …  245K           6 (14%)        0%
  branch_misses      6.18M  ± 5.05K     6.17M  … 6.19M           0 ( 0%)        0%
Benchmark 2 (42 runs): ./target/release/examples/blogpost-compress 2 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           120ms ±  924us     119ms …  124ms          3 ( 7%)          +  0.2% ±  0.4%
  peak_rss           24.9MB ± 67.1KB    24.7MB … 25.0MB          1 ( 2%)          -  0.1% ±  0.1%
  cpu_cycles          487M  ± 1.28M      485M  …  489M           0 ( 0%)          +  0.1% ±  0.2%
  instructions       1.07G  ±  386      1.07G  … 1.07G           4 (10%)          +  0.0% ±  0.0%
  cache_references    268K  ± 4.72K      263K  …  289K           3 ( 7%)          -  0.4% ±  0.8%
  cache_misses        234K  ± 5.38K      213K  …  246K           3 ( 7%)          +  0.4% ±  1.2%
  branch_misses      6.19M  ± 7.92K     6.17M  … 6.20M           0 ( 0%)          +  0.1% ±  0.0%
Benchmark 1 (37 runs): ./blogpost-compress-baseline 3 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           137ms ±  746us     136ms …  139ms          2 ( 5%)        0%
  peak_rss           24.7MB ± 69.8KB    24.6MB … 24.8MB          0 ( 0%)        0%
  cpu_cycles          561M  ± 1.89M      559M  …  569M           1 ( 3%)        0%
  instructions       1.40G  ±  373      1.40G  … 1.40G           1 ( 3%)        0%
  cache_references    270K  ± 5.25K      264K  …  290K           4 (11%)        0%
  cache_misses        232K  ± 8.90K      207K  …  245K           6 (16%)        0%
  branch_misses      7.06M  ± 5.46K     7.05M  … 7.07M           0 ( 0%)        0%
Benchmark 2 (37 runs): ./target/release/examples/blogpost-compress 3 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           137ms ±  645us     136ms …  138ms          0 ( 0%)          -  0.0% ±  0.2%
  peak_rss           24.7MB ± 77.3KB    24.5MB … 24.8MB          0 ( 0%)          +  0.0% ±  0.1%
  cpu_cycles          560M  ± 1.58M      558M  …  566M           1 ( 3%)          -  0.1% ±  0.1%
  instructions       1.40G  ±  346      1.40G  … 1.40G           2 ( 5%)          +  0.0% ±  0.0%
  cache_references    273K  ± 14.5K      265K  …  340K           4 (11%)          +  1.1% ±  1.9%
  cache_misses        231K  ± 8.99K      206K  …  241K           7 (19%)          -  0.2% ±  1.8%
  branch_misses      7.04M  ± 4.09K     7.03M  … 7.05M           0 ( 0%)          -  0.4% ±  0.0%
Benchmark 1 (32 runs): ./blogpost-compress-baseline 4 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           160ms ± 1.50ms     159ms …  165ms          3 ( 9%)        0%
  peak_rss           24.5MB ± 87.8KB    24.4MB … 24.7MB          1 ( 3%)        0%
  cpu_cycles          662M  ± 4.91M      659M  …  681M           5 (16%)        0%
  instructions       1.50G  ±  317      1.50G  … 1.50G           0 ( 0%)        0%
  cache_references    274K  ± 13.8K      265K  …  336K           5 (16%)        0%
  cache_misses        235K  ± 6.20K      206K  …  246K           3 ( 9%)        0%
  branch_misses      7.65M  ± 4.45K     7.64M  … 7.66M           1 ( 3%)        0%
Benchmark 2 (32 runs): ./target/release/examples/blogpost-compress 4 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           160ms ± 3.38ms     158ms …  175ms          5 (16%)          +  0.4% ±  0.8%
  peak_rss           24.5MB ± 74.9KB    24.3MB … 24.7MB          1 ( 3%)          -  0.1% ±  0.2%
  cpu_cycles          663M  ± 10.4M      658M  …  712M           6 (19%)          +  0.2% ±  0.6%
  instructions       1.50G  ±  277      1.50G  … 1.50G           0 ( 0%)          +  0.0% ±  0.0%
  cache_references    277K  ± 26.4K      265K  …  398K           2 ( 6%)          +  1.4% ±  3.8%
  cache_misses        235K  ± 8.44K      204K  …  248K           7 (22%)          +  0.2% ±  1.6%
  branch_misses      7.56M  ± 22.4K     7.54M  … 7.67M           2 ( 6%)        ⚡-  1.2% ±  0.1%
Benchmark 1 (29 runs): ./blogpost-compress-baseline 5 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           175ms ± 1.01ms     174ms …  178ms          1 ( 3%)        0%
  peak_rss           24.5MB ± 88.6KB    24.3MB … 24.7MB          1 ( 3%)        0%
  cpu_cycles          728M  ±  728K      727M  …  730M           1 ( 3%)        0%
  instructions       1.73G  ±  557      1.73G  … 1.73G           2 ( 7%)        0%
  cache_references    270K  ± 6.47K      265K  …  295K           3 (10%)        0%
  cache_misses        234K  ± 6.63K      209K  …  246K           3 (10%)        0%
  branch_misses      8.21M  ± 5.09K     8.20M  … 8.22M           0 ( 0%)        0%
Benchmark 2 (29 runs): ./target/release/examples/blogpost-compress 5 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           176ms ±  599us     175ms …  178ms          1 ( 3%)          +  0.5% ±  0.3%
  peak_rss           24.6MB ± 58.5KB    24.4MB … 24.7MB          1 ( 3%)          +  0.1% ±  0.2%
  cpu_cycles          733M  ± 1.44M      731M  …  737M           0 ( 0%)          +  0.7% ±  0.1%
  instructions       1.73G  ±  318      1.73G  … 1.73G           2 ( 7%)          +  0.0% ±  0.0%
  cache_references    270K  ± 7.28K      264K  …  297K           3 (10%)          +  0.0% ±  1.3%
  cache_misses        233K  ± 6.86K      208K  …  240K           2 ( 7%)          -  0.2% ±  1.5%
  branch_misses      8.28M  ± 36.3K     8.23M  … 8.36M           0 ( 0%)          +  0.8% ±  0.2%
Benchmark 1 (24 runs): ./blogpost-compress-baseline 6 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           215ms ±  810us     214ms …  218ms          1 ( 4%)        0%
  peak_rss           24.5MB ± 77.8KB    24.4MB … 24.7MB          0 ( 0%)        0%
  cpu_cycles          902M  ± 1.28M      900M  …  905M           1 ( 4%)        0%
  instructions       1.90G  ±  369      1.90G  … 1.90G           2 ( 8%)        0%
  cache_references    273K  ± 7.42K      266K  …  301K           1 ( 4%)        0%
  cache_misses        234K  ± 7.23K      211K  …  241K           2 ( 8%)        0%
  branch_misses      8.35M  ± 6.15K     8.34M  … 8.36M           1 ( 4%)        0%
Benchmark 2 (24 runs): ./target/release/examples/blogpost-compress 6 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           217ms ± 1.05ms     216ms …  220ms          1 ( 4%)          +  0.9% ±  0.3%
  peak_rss           24.5MB ± 84.6KB    24.4MB … 24.7MB          0 ( 0%)          -  0.0% ±  0.2%
  cpu_cycles          908M  ± 1.13M      906M  …  910M           0 ( 0%)          +  0.7% ±  0.1%
  instructions       1.90G  ±  353      1.90G  … 1.90G           0 ( 0%)          +  0.0% ±  0.0%
  cache_references    271K  ± 7.60K      266K  …  293K           3 (13%)          -  0.5% ±  1.6%
  cache_misses        235K  ± 6.84K      208K  …  246K           4 (17%)          +  0.4% ±  1.8%
  branch_misses      8.45M  ± 34.4K     8.39M  … 8.51M           2 ( 8%)        💩+  1.3% ±  0.2%
Benchmark 1 (17 runs): ./blogpost-compress-baseline 7 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           299ms ± 1.03ms     298ms …  300ms          0 ( 0%)        0%
  peak_rss           24.4MB ± 65.2KB    24.3MB … 24.5MB          0 ( 0%)        0%
  cpu_cycles         1.25G  ± 2.00M     1.25G  … 1.26G           1 ( 6%)        0%
  instructions       2.30G  ±  297      2.30G  … 2.30G           0 ( 0%)        0%
  cache_references    275K  ± 8.87K      265K  …  296K           0 ( 0%)        0%
  cache_misses        235K  ± 6.50K      216K  …  241K           2 (12%)        0%
  branch_misses      9.52M  ± 9.74K     9.51M  … 9.54M           1 ( 6%)        0%
Benchmark 2 (17 runs): ./target/release/examples/blogpost-compress 7 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           300ms ± 1.42ms     299ms …  304ms          1 ( 6%)          +  0.3% ±  0.3%
  peak_rss           24.4MB ± 71.6KB    24.3MB … 24.5MB          0 ( 0%)          +  0.0% ±  0.2%
  cpu_cycles         1.26G  ± 3.56M     1.26G  … 1.27G           1 ( 6%)          +  0.3% ±  0.2%
  instructions       2.30G  ±  268      2.30G  … 2.30G           0 ( 0%)          +  0.0% ±  0.0%
  cache_references    275K  ± 10.6K      266K  …  309K           1 ( 6%)          +  0.1% ±  2.5%
  cache_misses        237K  ± 6.98K      212K  …  247K           2 (12%)          +  1.0% ±  2.0%
  branch_misses      9.56M  ± 12.8K     9.53M  … 9.58M           0 ( 0%)          +  0.4% ±  0.1%
Benchmark 1 (13 runs): ./blogpost-compress-baseline 8 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           394ms ± 1.11ms     393ms …  397ms          0 ( 0%)        0%
  peak_rss           24.4MB ± 61.8KB    24.4MB … 24.6MB          1 ( 8%)        0%
  cpu_cycles         1.66G  ± 1.34M     1.66G  … 1.66G           0 ( 0%)        0%
  instructions       2.75G  ±  344      2.75G  … 2.75G           1 ( 8%)        0%
  cache_references    275K  ± 10.3K      266K  …  304K           1 ( 8%)        0%
  cache_misses        234K  ± 9.75K      217K  …  250K           0 ( 0%)        0%
  branch_misses      9.65M  ± 11.3K     9.63M  … 9.67M           0 ( 0%)        0%
Benchmark 2 (13 runs): ./target/release/examples/blogpost-compress 8 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           396ms ± 1.74ms     394ms …  399ms          0 ( 0%)          +  0.3% ±  0.3%
  peak_rss           24.4MB ± 74.7KB    24.2MB … 24.5MB          1 ( 8%)          -  0.3% ±  0.2%
  cpu_cycles         1.66G  ± 3.41M     1.66G  … 1.67G           1 ( 8%)          +  0.3% ±  0.1%
  instructions       2.75G  ±  450      2.75G  … 2.75G           0 ( 0%)          +  0.0% ±  0.0%
  cache_references    288K  ± 39.1K      264K  …  412K           1 ( 8%)          +  4.7% ±  8.4%
  cache_misses        237K  ± 7.25K      216K  …  244K           1 ( 8%)          +  1.1% ±  3.0%
  branch_misses      9.69M  ± 8.48K     9.67M  … 9.70M           0 ( 0%)          +  0.4% ±  0.1%
Benchmark 1 (12 runs): ./blogpost-compress-baseline 9 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           421ms ± 3.65ms     418ms …  430ms          1 ( 8%)        0%
  peak_rss           24.4MB ± 75.3KB    24.2MB … 24.5MB          0 ( 0%)        0%
  cpu_cycles         1.77G  ± 9.52M     1.77G  … 1.79G           2 (17%)        0%
  instructions       3.38G  ±  225      3.38G  … 3.38G           3 (25%)        0%
  cache_references    278K  ± 10.2K      267K  …  303K           0 ( 0%)        0%
  cache_misses        239K  ± 4.79K      231K  …  248K           0 ( 0%)        0%
  branch_misses      15.5M  ± 36.9K     15.5M  … 15.6M           0 ( 0%)        0%
Benchmark 2 (12 runs): ./target/release/examples/blogpost-compress 9 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           421ms ± 1.42ms     420ms …  424ms          0 ( 0%)          -  0.0% ±  0.6%
  peak_rss           24.4MB ± 81.1KB    24.2MB … 24.5MB          4 (33%)          +  0.2% ±  0.3%
  cpu_cycles         1.77G  ± 1.75M     1.77G  … 1.78G           2 (17%)          +  0.1% ±  0.3%
  instructions       3.38G  ±  269      3.38G  … 3.38G           0 ( 0%)          +  0.0% ±  0.0%
  cache_references    277K  ± 13.9K      268K  …  318K           1 ( 8%)          -  0.7% ±  3.7%
  cache_misses        235K  ± 6.98K      221K  …  244K           2 (17%)          -  1.6% ±  2.1%
  branch_misses      15.8M  ± 27.9K     15.8M  … 15.9M           0 ( 0%)        💩+  1.9% ±  0.2%

Copy link
Member

@folkertdev folkertdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work, thanks!

@folkertdev folkertdev merged commit d154bf0 into trifectatechfoundation:main May 28, 2025
24 checks passed
@brian-pane brian-pane deleted the reuse-read branch May 28, 2025 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants