Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zstd: Copy literal in 16 byte blocks when possible #592

Merged
merged 1 commit into from
May 12, 2022

Conversation

klauspost
Copy link
Owner

Also reduces literal overalloc when full allocs are allowed.

benchmark                                                                                          old ns/op     new ns/op     delta
BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-32                                                14572         13898         -4.63%
BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-32                                            3946          3682          -6.69%
BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-32                                             45150         43296         -4.11%
BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-32                                               33525         36679         +9.41%
BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-32                                             11952         10496         -12.18%
BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-32                                              14081         13339         -5.27%
BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-32                                                 12111         11745         -3.02%
BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-32                                           1073          1037          -3.36%
BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-32                                           1759          1841          +4.66%
BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-32                                                 43722         39755         -9.07%
BenchmarkDecoder_DecodeAllParallel/html.zst-32                                                     4144          3756          -9.36%
BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-32                                            1240          1240          +0.00%
BenchmarkDecoder_DecodeAll/kppkn.gtb.zst-32                                                        250426        240012        -4.16%
BenchmarkDecoder_DecodeAll/geo.protodata.zst-32                                                    71861         65548         -8.79%
BenchmarkDecoder_DecodeAll/plrabn12.txt.zst-32                                                     829878        736934        -11.20%
BenchmarkDecoder_DecodeAll/lcet10.txt.zst-32                                                       609402        683505        +12.16%
BenchmarkDecoder_DecodeAll/asyoulik.txt.zst-32                                                     231636        189146        -18.34%
BenchmarkDecoder_DecodeAll/alice29.txt.zst-32                                                      245022        226451        -7.58%
BenchmarkDecoder_DecodeAll/html_x_4.zst-32                                                         229709        216421        -5.78%
BenchmarkDecoder_DecodeAll/paper-100k.pdf.zst-32                                                   18400         17850         -2.99%
BenchmarkDecoder_DecodeAll/fireworks.jpeg.zst-32                                                   9682          9801          +1.23%
BenchmarkDecoder_DecodeAll/urls.10K.zst-32                                                         924472        796913        -13.80%
BenchmarkDecoder_DecodeAll/html.zst-32                                                             77728         66831         -14.02%
BenchmarkDecoder_DecodeAll/comp-data.bin.zst-32                                                    7985          7432          -6.93%
Benchmark_seqdec_execute/n-12286-lits-13914-prev-9869-1990358-3296656-win-4194304.blk-32           130498        106559        -18.34%
Benchmark_seqdec_execute/n-12485-lits-6960-prev-976039-2250252-2463561-win-4194304.blk-32          136475        121699        -10.83%
Benchmark_seqdec_execute/n-14746-lits-14461-prev-209-8-1379909-win-4194304.blk-32                  43119         33598         -22.08%
Benchmark_seqdec_execute/n-1525-lits-1498-prev-2009476-797934-2994405-win-4194304.blk-32           15723         14472         -7.96%
Benchmark_seqdec_execute/n-3478-lits-3628-prev-895243-2104056-2119329-win-4194304.blk-32           25968         19734         -24.01%
Benchmark_seqdec_execute/n-8422-lits-5840-prev-168095-2298675-433830-win-4194304.blk-32            88906         79506         -10.57%
Benchmark_seqdec_execute/n-1000-lits-1057-prev-21887-92-217-win-8388608.blk-32                     7385          7269          -1.57%
Benchmark_seqdec_execute/n-15134-lits-20798-prev-4882976-4884216-4474622-win-8388608.blk-32        83133         64295         -22.66%
Benchmark_seqdec_execute/n-2-lits-0-prev-620601-689171-848-win-8388608.blk-32                      2899          2881          -0.62%
Benchmark_seqdec_execute/n-90-lits-67-prev-19498-23-19710-win-8388608.blk-32                       3951          3961          +0.25%
Benchmark_seqdec_execute/n-931-lits-1179-prev-36502-1526-1518-win-8388608.blk-32                   7063          6809          -3.60%
Benchmark_seqdec_execute/n-2898-lits-4062-prev-335-386-751-win-8388608.blk-32                      14045         14050         +0.04%
Benchmark_seqdec_execute/n-4056-lits-12419-prev-10792-66-309849-win-8388608.blk-32                 19679         18611         -5.43%
Benchmark_seqdec_execute/n-8028-lits-4568-prev-917-65-920-win-8388608.blk-32                       48841         45545         -6.75%
Benchmark_seqdec_decodeSync/n-12286-lits-13914-prev-9869-1990358-3296656-win-4194304.blk-32        276464        273620        -1.03%
Benchmark_seqdec_decodeSync/n-12485-lits-6960-prev-976039-2250252-2463561-win-4194304.blk-32       270905        269049        -0.69%
Benchmark_seqdec_decodeSync/n-14746-lits-14461-prev-209-8-1379909-win-4194304.blk-32               146061        145878        -0.13%
Benchmark_seqdec_decodeSync/n-1525-lits-1498-prev-2009476-797934-2994405-win-4194304.blk-32        30686         27367         -10.82%
Benchmark_seqdec_decodeSync/n-3478-lits-3628-prev-895243-2104056-2119329-win-4194304.blk-32        88493         87167         -1.50%
Benchmark_seqdec_decodeSync/n-8422-lits-5840-prev-168095-2298675-433830-win-4194304.blk-32         195326        195764        +0.22%
Benchmark_seqdec_decodeSync/n-1000-lits-1057-prev-21887-92-217-win-8388608.blk-32                  14081         13925         -1.11%
Benchmark_seqdec_decodeSync/n-15134-lits-20798-prev-4882976-4884216-4474622-win-8388608.blk-32     297178        298192        +0.34%
Benchmark_seqdec_decodeSync/n-2-lits-0-prev-620601-689171-848-win-8388608.blk-32                   2935          2921          -0.48%
Benchmark_seqdec_decodeSync/n-90-lits-67-prev-19498-23-19710-win-8388608.blk-32                    4856          4467          -8.01%
Benchmark_seqdec_decodeSync/n-931-lits-1179-prev-36502-1526-1518-win-8388608.blk-32                14059         14050         -0.06%
Benchmark_seqdec_decodeSync/n-2898-lits-4062-prev-335-386-751-win-8388608.blk-32                   35636         33427         -6.20%
Benchmark_seqdec_decodeSync/n-4056-lits-12419-prev-10792-66-309849-win-8388608.blk-32              88618         85660         -3.34%
Benchmark_seqdec_decodeSync/n-8028-lits-4568-prev-917-65-920-win-8388608.blk-32                    162282        160568        -1.06%

lcet10.txt doesn't like it, otherwise mostly positive.

Streams before/after:

BenchmarkDecoderEnwik9-32    	       1	1288277200 ns/op	 776.23 MB/s	   59552 B/op	      44 allocs/op
BenchmarkDecoderEnwik9/multithreaded-writer-32         	       1	1191034000 ns/op	 839.61 MB/s	13993224 B/op	     113 allocs/op

BenchmarkDecoderSilesia-32    	       5	 209913160 ns/op	1009.69 MB/s	   46715 B/op	      38 allocs/op
BenchmarkDecoderSilesia/multithreaded-writer-32         	       5	 201394480 ns/op	1052.40 MB/s	 5129462 B/op	      77 allocs/op

Also reduces literal overalloc when full allocs are allowed.

```
benchmark                                                                                          old ns/op     new ns/op     delta
BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-32                                                14572         13898         -4.63%
BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-32                                            3946          3682          -6.69%
BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-32                                             45150         43296         -4.11%
BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-32                                               33525         36679         +9.41%
BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-32                                             11952         10496         -12.18%
BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-32                                              14081         13339         -5.27%
BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-32                                                 12111         11745         -3.02%
BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-32                                           1073          1037          -3.36%
BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-32                                           1759          1841          +4.66%
BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-32                                                 43722         39755         -9.07%
BenchmarkDecoder_DecodeAllParallel/html.zst-32                                                     4144          3756          -9.36%
BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-32                                            1240          1240          +0.00%
BenchmarkDecoder_DecodeAll/kppkn.gtb.zst-32                                                        250426        240012        -4.16%
BenchmarkDecoder_DecodeAll/geo.protodata.zst-32                                                    71861         65548         -8.79%
BenchmarkDecoder_DecodeAll/plrabn12.txt.zst-32                                                     829878        736934        -11.20%
BenchmarkDecoder_DecodeAll/lcet10.txt.zst-32                                                       609402        683505        +12.16%
BenchmarkDecoder_DecodeAll/asyoulik.txt.zst-32                                                     231636        189146        -18.34%
BenchmarkDecoder_DecodeAll/alice29.txt.zst-32                                                      245022        226451        -7.58%
BenchmarkDecoder_DecodeAll/html_x_4.zst-32                                                         229709        216421        -5.78%
BenchmarkDecoder_DecodeAll/paper-100k.pdf.zst-32                                                   18400         17850         -2.99%
BenchmarkDecoder_DecodeAll/fireworks.jpeg.zst-32                                                   9682          9801          +1.23%
BenchmarkDecoder_DecodeAll/urls.10K.zst-32                                                         924472        796913        -13.80%
BenchmarkDecoder_DecodeAll/html.zst-32                                                             77728         66831         -14.02%
BenchmarkDecoder_DecodeAll/comp-data.bin.zst-32                                                    7985          7432          -6.93%
Benchmark_seqdec_execute/n-12286-lits-13914-prev-9869-1990358-3296656-win-4194304.blk-32           130498        106559        -18.34%
Benchmark_seqdec_execute/n-12485-lits-6960-prev-976039-2250252-2463561-win-4194304.blk-32          136475        121699        -10.83%
Benchmark_seqdec_execute/n-14746-lits-14461-prev-209-8-1379909-win-4194304.blk-32                  43119         33598         -22.08%
Benchmark_seqdec_execute/n-1525-lits-1498-prev-2009476-797934-2994405-win-4194304.blk-32           15723         14472         -7.96%
Benchmark_seqdec_execute/n-3478-lits-3628-prev-895243-2104056-2119329-win-4194304.blk-32           25968         19734         -24.01%
Benchmark_seqdec_execute/n-8422-lits-5840-prev-168095-2298675-433830-win-4194304.blk-32            88906         79506         -10.57%
Benchmark_seqdec_execute/n-1000-lits-1057-prev-21887-92-217-win-8388608.blk-32                     7385          7269          -1.57%
Benchmark_seqdec_execute/n-15134-lits-20798-prev-4882976-4884216-4474622-win-8388608.blk-32        83133         64295         -22.66%
Benchmark_seqdec_execute/n-2-lits-0-prev-620601-689171-848-win-8388608.blk-32                      2899          2881          -0.62%
Benchmark_seqdec_execute/n-90-lits-67-prev-19498-23-19710-win-8388608.blk-32                       3951          3961          +0.25%
Benchmark_seqdec_execute/n-931-lits-1179-prev-36502-1526-1518-win-8388608.blk-32                   7063          6809          -3.60%
Benchmark_seqdec_execute/n-2898-lits-4062-prev-335-386-751-win-8388608.blk-32                      14045         14050         +0.04%
Benchmark_seqdec_execute/n-4056-lits-12419-prev-10792-66-309849-win-8388608.blk-32                 19679         18611         -5.43%
Benchmark_seqdec_execute/n-8028-lits-4568-prev-917-65-920-win-8388608.blk-32                       48841         45545         -6.75%
Benchmark_seqdec_decodeSync/n-12286-lits-13914-prev-9869-1990358-3296656-win-4194304.blk-32        276464        273620        -1.03%
Benchmark_seqdec_decodeSync/n-12485-lits-6960-prev-976039-2250252-2463561-win-4194304.blk-32       270905        269049        -0.69%
Benchmark_seqdec_decodeSync/n-14746-lits-14461-prev-209-8-1379909-win-4194304.blk-32               146061        145878        -0.13%
Benchmark_seqdec_decodeSync/n-1525-lits-1498-prev-2009476-797934-2994405-win-4194304.blk-32        30686         27367         -10.82%
Benchmark_seqdec_decodeSync/n-3478-lits-3628-prev-895243-2104056-2119329-win-4194304.blk-32        88493         87167         -1.50%
Benchmark_seqdec_decodeSync/n-8422-lits-5840-prev-168095-2298675-433830-win-4194304.blk-32         195326        195764        +0.22%
Benchmark_seqdec_decodeSync/n-1000-lits-1057-prev-21887-92-217-win-8388608.blk-32                  14081         13925         -1.11%
Benchmark_seqdec_decodeSync/n-15134-lits-20798-prev-4882976-4884216-4474622-win-8388608.blk-32     297178        298192        +0.34%
Benchmark_seqdec_decodeSync/n-2-lits-0-prev-620601-689171-848-win-8388608.blk-32                   2935          2921          -0.48%
Benchmark_seqdec_decodeSync/n-90-lits-67-prev-19498-23-19710-win-8388608.blk-32                    4856          4467          -8.01%
Benchmark_seqdec_decodeSync/n-931-lits-1179-prev-36502-1526-1518-win-8388608.blk-32                14059         14050         -0.06%
Benchmark_seqdec_decodeSync/n-2898-lits-4062-prev-335-386-751-win-8388608.blk-32                   35636         33427         -6.20%
Benchmark_seqdec_decodeSync/n-4056-lits-12419-prev-10792-66-309849-win-8388608.blk-32              88618         85660         -3.34%
Benchmark_seqdec_decodeSync/n-8028-lits-4568-prev-917-65-920-win-8388608.blk-32                    162282        160568        -1.06%
```

`lcet10.txt` doesn't like it, otherwise mostly positive.

Streams before/after:
```
BenchmarkDecoderEnwik9-32    	       1	1288277200 ns/op	 776.23 MB/s	   59552 B/op	      44 allocs/op
BenchmarkDecoderEnwik9/multithreaded-writer-32         	       1	1191034000 ns/op	 839.61 MB/s	13993224 B/op	     113 allocs/op

BenchmarkDecoderSilesia-32    	       5	 209913160 ns/op	1009.69 MB/s	   46715 B/op	      38 allocs/op
BenchmarkDecoderSilesia/multithreaded-writer-32         	       5	 201394480 ns/op	1052.40 MB/s	 5129462 B/op	      77 allocs/op
```
@WojciechMula
Copy link
Contributor

Great improvement!

@WojciechMula
Copy link
Contributor

A funny fact is that I wanted to pick that problem and planned to ask you where to start. :)

@klauspost klauspost merged commit 6ebbb85 into master May 12, 2022
@klauspost klauspost deleted the zstd-16b-literal-copies branch May 12, 2022 10:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants