Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
zstd: Copy literal in 16 byte blocks when possible (#592)
Also reduces literal overalloc when full allocs are allowed. ``` benchmark old ns/op new ns/op delta BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-32 14572 13898 -4.63% BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-32 3946 3682 -6.69% BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-32 45150 43296 -4.11% BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-32 33525 36679 +9.41% BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-32 11952 10496 -12.18% BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-32 14081 13339 -5.27% BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-32 12111 11745 -3.02% BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-32 1073 1037 -3.36% BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-32 1759 1841 +4.66% BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-32 43722 39755 -9.07% BenchmarkDecoder_DecodeAllParallel/html.zst-32 4144 3756 -9.36% BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-32 1240 1240 +0.00% BenchmarkDecoder_DecodeAll/kppkn.gtb.zst-32 250426 240012 -4.16% BenchmarkDecoder_DecodeAll/geo.protodata.zst-32 71861 65548 -8.79% BenchmarkDecoder_DecodeAll/plrabn12.txt.zst-32 829878 736934 -11.20% BenchmarkDecoder_DecodeAll/lcet10.txt.zst-32 609402 683505 +12.16% BenchmarkDecoder_DecodeAll/asyoulik.txt.zst-32 231636 189146 -18.34% BenchmarkDecoder_DecodeAll/alice29.txt.zst-32 245022 226451 -7.58% BenchmarkDecoder_DecodeAll/html_x_4.zst-32 229709 216421 -5.78% BenchmarkDecoder_DecodeAll/paper-100k.pdf.zst-32 18400 17850 -2.99% BenchmarkDecoder_DecodeAll/fireworks.jpeg.zst-32 9682 9801 +1.23% BenchmarkDecoder_DecodeAll/urls.10K.zst-32 924472 796913 -13.80% BenchmarkDecoder_DecodeAll/html.zst-32 77728 66831 -14.02% BenchmarkDecoder_DecodeAll/comp-data.bin.zst-32 7985 7432 -6.93% Benchmark_seqdec_execute/n-12286-lits-13914-prev-9869-1990358-3296656-win-4194304.blk-32 130498 106559 -18.34% Benchmark_seqdec_execute/n-12485-lits-6960-prev-976039-2250252-2463561-win-4194304.blk-32 136475 121699 -10.83% Benchmark_seqdec_execute/n-14746-lits-14461-prev-209-8-1379909-win-4194304.blk-32 43119 33598 -22.08% Benchmark_seqdec_execute/n-1525-lits-1498-prev-2009476-797934-2994405-win-4194304.blk-32 15723 14472 -7.96% Benchmark_seqdec_execute/n-3478-lits-3628-prev-895243-2104056-2119329-win-4194304.blk-32 25968 19734 -24.01% Benchmark_seqdec_execute/n-8422-lits-5840-prev-168095-2298675-433830-win-4194304.blk-32 88906 79506 -10.57% Benchmark_seqdec_execute/n-1000-lits-1057-prev-21887-92-217-win-8388608.blk-32 7385 7269 -1.57% Benchmark_seqdec_execute/n-15134-lits-20798-prev-4882976-4884216-4474622-win-8388608.blk-32 83133 64295 -22.66% Benchmark_seqdec_execute/n-2-lits-0-prev-620601-689171-848-win-8388608.blk-32 2899 2881 -0.62% Benchmark_seqdec_execute/n-90-lits-67-prev-19498-23-19710-win-8388608.blk-32 3951 3961 +0.25% Benchmark_seqdec_execute/n-931-lits-1179-prev-36502-1526-1518-win-8388608.blk-32 7063 6809 -3.60% Benchmark_seqdec_execute/n-2898-lits-4062-prev-335-386-751-win-8388608.blk-32 14045 14050 +0.04% Benchmark_seqdec_execute/n-4056-lits-12419-prev-10792-66-309849-win-8388608.blk-32 19679 18611 -5.43% Benchmark_seqdec_execute/n-8028-lits-4568-prev-917-65-920-win-8388608.blk-32 48841 45545 -6.75% Benchmark_seqdec_decodeSync/n-12286-lits-13914-prev-9869-1990358-3296656-win-4194304.blk-32 276464 273620 -1.03% Benchmark_seqdec_decodeSync/n-12485-lits-6960-prev-976039-2250252-2463561-win-4194304.blk-32 270905 269049 -0.69% Benchmark_seqdec_decodeSync/n-14746-lits-14461-prev-209-8-1379909-win-4194304.blk-32 146061 145878 -0.13% Benchmark_seqdec_decodeSync/n-1525-lits-1498-prev-2009476-797934-2994405-win-4194304.blk-32 30686 27367 -10.82% Benchmark_seqdec_decodeSync/n-3478-lits-3628-prev-895243-2104056-2119329-win-4194304.blk-32 88493 87167 -1.50% Benchmark_seqdec_decodeSync/n-8422-lits-5840-prev-168095-2298675-433830-win-4194304.blk-32 195326 195764 +0.22% Benchmark_seqdec_decodeSync/n-1000-lits-1057-prev-21887-92-217-win-8388608.blk-32 14081 13925 -1.11% Benchmark_seqdec_decodeSync/n-15134-lits-20798-prev-4882976-4884216-4474622-win-8388608.blk-32 297178 298192 +0.34% Benchmark_seqdec_decodeSync/n-2-lits-0-prev-620601-689171-848-win-8388608.blk-32 2935 2921 -0.48% Benchmark_seqdec_decodeSync/n-90-lits-67-prev-19498-23-19710-win-8388608.blk-32 4856 4467 -8.01% Benchmark_seqdec_decodeSync/n-931-lits-1179-prev-36502-1526-1518-win-8388608.blk-32 14059 14050 -0.06% Benchmark_seqdec_decodeSync/n-2898-lits-4062-prev-335-386-751-win-8388608.blk-32 35636 33427 -6.20% Benchmark_seqdec_decodeSync/n-4056-lits-12419-prev-10792-66-309849-win-8388608.blk-32 88618 85660 -3.34% Benchmark_seqdec_decodeSync/n-8028-lits-4568-prev-917-65-920-win-8388608.blk-32 162282 160568 -1.06% ``` `lcet10.txt` doesn't like it, otherwise mostly positive. Streams before/after: ``` BenchmarkDecoderEnwik9-32 1 1288277200 ns/op 776.23 MB/s 59552 B/op 44 allocs/op BenchmarkDecoderEnwik9/multithreaded-writer-32 1 1191034000 ns/op 839.61 MB/s 13993224 B/op 113 allocs/op BenchmarkDecoderSilesia-32 5 209913160 ns/op 1009.69 MB/s 46715 B/op 38 allocs/op BenchmarkDecoderSilesia/multithreaded-writer-32 5 201394480 ns/op 1052.40 MB/s 5129462 B/op 77 allocs/op ```
- Loading branch information