Skip to content

Conversation

@nightwolfz
Copy link
Contributor

@nightwolfz nightwolfz commented Sep 25, 2022

benchstat -delta-test none old2.txt new2.txt
name                                  old time/op    new time/op     delta
Encoder_EncodeAllXML-16                 13.3ms ± 0%     13.0ms ± 0%    -2.11%
Encoder_EncodeAllSimple/fastest-16       229µs ± 0%      227µs ± 0%    -1.08%
Encoder_EncodeAllSimple/default-16       343µs ± 0%      371µs ± 0%    +8.13%
Encoder_EncodeAllSimple/better-16        402µs ± 0%      393µs ± 0%    -2.33%
Encoder_EncodeAllSimple/best-16         6.41ms ± 0%     2.72ms ± 0%   -57.48%   <====
Encoder_EncodeAllSimple4K/fastest-16    2.70µs ± 0%     2.56µs ± 0%    -5.26%
Encoder_EncodeAllSimple4K/default-16    33.1µs ± 0%     33.5µs ± 0%    +1.30%
Encoder_EncodeAllSimple4K/better-16     39.3µs ± 0%     38.8µs ± 0%    -1.12%
Encoder_EncodeAllSimple4K/best-16        732µs ± 0%      360µs ± 0%   -50.90%   <====
Encoder_EncodeAllHTML-16                 213µs ± 0%      209µs ± 0%    -2.07%
Encoder_EncodeAllTwain-16               3.23ms ± 0%     3.23ms ± 0%    -0.04%
Encoder_EncodeAllPi-16                  1.12ms ± 0%     1.11ms ± 0%    -1.01%
Random4KEncodeAllFastest-16              988ns ± 0%      976ns ± 0%    -1.31%
Random10MBEncodeAllFastest-16           2.50ms ± 0%     2.48ms ± 0%    -0.70%
Random4KEncodeAllDefault-16             4.58µs ± 0%     4.56µs ± 0%    -0.31%
RandomEncodeAllDefault-16               2.58ms ± 0%     2.52ms ± 0%    -2.20%
Random10MBEncoderFastest-16             3.61ms ± 0%     3.61ms ± 0%    -0.04%
RandomEncoderDefault-16                 3.44ms ± 0%     3.44ms ± 0%    +0.03%

name                                  old speed      new speed       delta
Encoder_EncodeAllXML-16                402MB/s ± 0%    410MB/s ± 0%    +2.16%
Encoder_EncodeAllSimple/fastest-16     173MB/s ± 0%    175MB/s ± 0%    +1.10%
Encoder_EncodeAllSimple/default-16     116MB/s ± 0%    107MB/s ± 0%    -7.52%
Encoder_EncodeAllSimple/better-16     99.0MB/s ± 0%  101.4MB/s ± 0%    +2.38%
Encoder_EncodeAllSimple/best-16       6.21MB/s ± 0%  14.61MB/s ± 0%  +135.27%  <====
Encoder_EncodeAllSimple4K/fastest-16  1.52GB/s ± 0%   1.60GB/s ± 0%    +5.56%
Encoder_EncodeAllSimple4K/default-16   124MB/s ± 0%    122MB/s ± 0%    -1.29%
Encoder_EncodeAllSimple4K/better-16    104MB/s ± 0%    106MB/s ± 0%    +1.13%
Encoder_EncodeAllSimple4K/best-16     5.59MB/s ± 0%  11.39MB/s ± 0%  +103.76%  <====
Encoder_EncodeAllHTML-16               208MB/s ± 0%    213MB/s ± 0%    +2.11%
Encoder_EncodeAllTwain-16              120MB/s ± 0%    120MB/s ± 0%    +0.04%
Encoder_EncodeAllPi-16                89.0MB/s ± 0%   89.9MB/s ± 0%    +1.02%
Random4KEncodeAllFastest-16           4.14GB/s ± 0%   4.20GB/s ± 0%    +1.32%
Random10MBEncodeAllFastest-16         4.19GB/s ± 0%   4.22GB/s ± 0%    +0.71%
Random4KEncodeAllDefault-16            895MB/s ± 0%    897MB/s ± 0%    +0.31%
RandomEncodeAllDefault-16             4.06GB/s ± 0%   4.15GB/s ± 0%    +2.25%
Random10MBEncoderFastest-16           2.90GB/s ± 0%   2.90GB/s ± 0%    +0.04%
RandomEncoderDefault-16               3.05GB/s ± 0%   3.05GB/s ± 0%    -0.03%

For information, I have also tried aligning other structs one by one but nothing else had any measurable effect.

```
benchstat -delta-test none old2.txt new2.txt
name                                  old time/op    new time/op     delta
Encoder_EncodeAllXML-16                 13.3ms ± 0%     13.0ms ± 0%    -2.11%
Encoder_EncodeAllSimple/fastest-16       229µs ± 0%      227µs ± 0%    -1.08%
Encoder_EncodeAllSimple/default-16       343µs ± 0%      371µs ± 0%    +8.13%
Encoder_EncodeAllSimple/better-16        402µs ± 0%      393µs ± 0%    -2.33%
Encoder_EncodeAllSimple/best-16         6.41ms ± 0%     2.72ms ± 0%   -57.48%  <====
Encoder_EncodeAllSimple4K/fastest-16    2.70µs ± 0%     2.56µs ± 0%    -5.26%
Encoder_EncodeAllSimple4K/default-16    33.1µs ± 0%     33.5µs ± 0%    +1.30%
Encoder_EncodeAllSimple4K/better-16     39.3µs ± 0%     38.8µs ± 0%    -1.12%
Encoder_EncodeAllSimple4K/best-16        732µs ± 0%      360µs ± 0%   -50.90%   <====
Encoder_EncodeAllHTML-16                 213µs ± 0%      209µs ± 0%    -2.07%
Encoder_EncodeAllTwain-16               3.23ms ± 0%     3.23ms ± 0%    -0.04%
Encoder_EncodeAllPi-16                  1.12ms ± 0%     1.11ms ± 0%    -1.01%
Random4KEncodeAllFastest-16              988ns ± 0%      976ns ± 0%    -1.31%
Random10MBEncodeAllFastest-16           2.50ms ± 0%     2.48ms ± 0%    -0.70%
Random4KEncodeAllDefault-16             4.58µs ± 0%     4.56µs ± 0%    -0.31%
RandomEncodeAllDefault-16               2.58ms ± 0%     2.52ms ± 0%    -2.20%
Random10MBEncoderFastest-16             3.61ms ± 0%     3.61ms ± 0%    -0.04%
RandomEncoderDefault-16                 3.44ms ± 0%     3.44ms ± 0%    +0.03%

name                                  old speed      new speed       delta
Encoder_EncodeAllXML-16                402MB/s ± 0%    410MB/s ± 0%    +2.16%
Encoder_EncodeAllSimple/fastest-16     173MB/s ± 0%    175MB/s ± 0%    +1.10%
Encoder_EncodeAllSimple/default-16     116MB/s ± 0%    107MB/s ± 0%    -7.52%
Encoder_EncodeAllSimple/better-16     99.0MB/s ± 0%  101.4MB/s ± 0%    +2.38%
Encoder_EncodeAllSimple/best-16       6.21MB/s ± 0%  14.61MB/s ± 0%  +135.27%  <====
Encoder_EncodeAllSimple4K/fastest-16  1.52GB/s ± 0%   1.60GB/s ± 0%    +5.56%
Encoder_EncodeAllSimple4K/default-16   124MB/s ± 0%    122MB/s ± 0%    -1.29%
Encoder_EncodeAllSimple4K/better-16    104MB/s ± 0%    106MB/s ± 0%    +1.13%
Encoder_EncodeAllSimple4K/best-16     5.59MB/s ± 0%  11.39MB/s ± 0%  +103.76%  <====
Encoder_EncodeAllHTML-16               208MB/s ± 0%    213MB/s ± 0%    +2.11%
Encoder_EncodeAllTwain-16              120MB/s ± 0%    120MB/s ± 0%    +0.04%
Encoder_EncodeAllPi-16                89.0MB/s ± 0%   89.9MB/s ± 0%    +1.02%
Random4KEncodeAllFastest-16           4.14GB/s ± 0%   4.20GB/s ± 0%    +1.32%
Random10MBEncodeAllFastest-16         4.19GB/s ± 0%   4.22GB/s ± 0%    +0.71%
Random4KEncodeAllDefault-16            895MB/s ± 0%    897MB/s ± 0%    +0.31%
RandomEncodeAllDefault-16             4.06GB/s ± 0%   4.15GB/s ± 0%    +2.25%
Random10MBEncoderFastest-16           2.90GB/s ± 0%   2.90GB/s ± 0%    +0.04%
RandomEncoderDefault-16               3.05GB/s ± 0%   3.05GB/s ± 0%    -0.03%
```
@nightwolfz nightwolfz changed the title [zstd/enc] Cache align struct for big perf boost zstd: Cache align struct for big perf boost Sep 25, 2022
@nightwolfz nightwolfz changed the title zstd: Cache align struct for big perf boost zstd: Improve "best" compression Sep 25, 2022
@klauspost
Copy link
Owner

Very nice! Could you add a comment explaining how you determined the size?

That way if the struct is changed whoever is looking at it will know how to adjust it.

@nightwolfz
Copy link
Contributor Author

@klauspost All done :)

@klauspost klauspost merged commit 3822c7c into klauspost:master Sep 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants