Skip to content

Conversation

@klauspost
Copy link
Owner

@klauspost klauspost commented Feb 1, 2022

Improve end-of-buffer speed.

Add goamd64_v3 version with small improvement for matching. For now set as tag to enable.

For now guarded by build tags to not duplicate all code.

benchmark                                                                 old ns/op      new ns/op      delta
BenchmarkTwainEncode1e1/default-32                                        8.32           8.28           -0.49%
BenchmarkTwainEncode1e1/better-32                                         8.36           8.32           -0.53%
BenchmarkTwainEncode1e1/snappy-default-32                                 8.34           8.32           -0.17%
BenchmarkTwainEncode1e1/snappy-better-32                                  8.31           8.31           +0.00%
BenchmarkTwainEncode1e2/default-32                                        94.4           93.8           -0.70%
BenchmarkTwainEncode1e2/better-32                                         273            269            -1.36%
BenchmarkTwainEncode1e2/snappy-default-32                                 94.7           93.6           -1.17%
BenchmarkTwainEncode1e2/snappy-better-32                                  273            268            -1.58%
BenchmarkTwainEncode1e3/default-32                                        872            867            -0.62%
BenchmarkTwainEncode1e3/better-32                                         2416           2403           -0.54%
BenchmarkTwainEncode1e3/snappy-default-32                                 869            862            -0.84%
BenchmarkTwainEncode1e3/snappy-better-32                                  2415           2402           -0.54%
BenchmarkTwainEncode1e4/default-32                                        10080          9862           -2.16%
BenchmarkTwainEncode1e4/better-32                                         24173          23778          -1.63%
BenchmarkTwainEncode1e4/snappy-default-32                                 10038          9900           -1.37%
BenchmarkTwainEncode1e4/snappy-better-32                                  24088          23655          -1.80%
BenchmarkTwainEncode1e5/default-32                                        208338         204080         -2.04%
BenchmarkTwainEncode1e5/better-32                                         400069         382699         -4.34%
BenchmarkTwainEncode1e5/snappy-default-32                                 207783         200382         -3.56%
BenchmarkTwainEncode1e5/snappy-better-32                                  388589         378026         -2.72%
BenchmarkTwainEncode1e6/default-32                                        2305542        2251826        -2.33%
BenchmarkTwainEncode1e6/better-32                                         4023332        3904791        -2.95%
BenchmarkTwainEncode1e6/snappy-default-32                                 2300992        2179567        -5.28%
BenchmarkTwainEncode1e6/snappy-better-32                                  3938222        3879487        -1.49%
BenchmarkTwainEncode1e7/default-32                                        23717990       22395276       -5.58%
BenchmarkTwainEncode1e7/better-32                                         42845300       42508469       -0.79%
BenchmarkTwainEncode1e7/snappy-default-32                                 23335686       22315622       -4.37%
BenchmarkTwainEncode1e7/snappy-better-32                                  42227550       41652074       -1.36%

Improve end-of-buffer speed.

Add `GOAMD64_v3` version with small improvement for matching.

For now guarded by build tags to not duplicate all code.

```
benchmark                                                                 old ns/op      new ns/op      delta
BenchmarkTwainEncode1e1/default-32                                        8.32           8.28           -0.49%
BenchmarkTwainEncode1e1/better-32                                         8.36           8.32           -0.53%
BenchmarkTwainEncode1e1/best-32                                           8.31           8.32           +0.02%
BenchmarkTwainEncode1e1/snappy-default-32                                 8.34           8.32           -0.17%
BenchmarkTwainEncode1e1/snappy-better-32                                  8.31           8.31           +0.00%
BenchmarkTwainEncode1e1/snappy-best-32                                    8.31           8.29           -0.20%
BenchmarkTwainEncode1e1/snappy-ref-noasm-32                               7.61           7.62           +0.22%
BenchmarkTwainEncode1e2/default-32                                        94.4           93.8           -0.70%
BenchmarkTwainEncode1e2/better-32                                         273            269            -1.36%
BenchmarkTwainEncode1e2/best-32                                           76827          75007          -2.37%
BenchmarkTwainEncode1e2/snappy-default-32                                 94.7           93.6           -1.17%
BenchmarkTwainEncode1e2/snappy-better-32                                  273            268            -1.58%
BenchmarkTwainEncode1e2/snappy-best-32                                    72735          72867          +0.18%
BenchmarkTwainEncode1e2/snappy-ref-noasm-32                               471            469            -0.25%
BenchmarkTwainEncode1e3/default-32                                        872            867            -0.62%
BenchmarkTwainEncode1e3/better-32                                         2416           2403           -0.54%
BenchmarkTwainEncode1e3/best-32                                           128772         128589         -0.14%
BenchmarkTwainEncode1e3/snappy-default-32                                 869            862            -0.84%
BenchmarkTwainEncode1e3/snappy-better-32                                  2415           2402           -0.54%
BenchmarkTwainEncode1e3/snappy-best-32                                    94544          92615          -2.04%
BenchmarkTwainEncode1e3/snappy-ref-noasm-32                               2317           2328           +0.47%
BenchmarkTwainEncode1e4/default-32                                        10080          9862           -2.16%
BenchmarkTwainEncode1e4/better-32                                         24173          23778          -1.63%
BenchmarkTwainEncode1e4/best-32                                           638221         632676         -0.87%
BenchmarkTwainEncode1e4/snappy-default-32                                 10038          9900           -1.37%
BenchmarkTwainEncode1e4/snappy-better-32                                  24088          23655          -1.80%
BenchmarkTwainEncode1e4/snappy-best-32                                    336750         334551         -0.65%
BenchmarkTwainEncode1e4/snappy-ref-noasm-32                               25050          24941          -0.44%
BenchmarkTwainEncode1e5/default-32                                        208338         204080         -2.04%
BenchmarkTwainEncode1e5/better-32                                         400069         382699         -4.34%
BenchmarkTwainEncode1e5/best-32                                           5249363        5374492        +2.38%
BenchmarkTwainEncode1e5/snappy-default-32                                 207783         200382         -3.56%
BenchmarkTwainEncode1e5/snappy-better-32                                  388589         378026         -2.72%
BenchmarkTwainEncode1e5/snappy-best-32                                    2889378        2781338        -3.74%
BenchmarkTwainEncode1e5/snappy-ref-noasm-32                               487332         484808         -0.52%
BenchmarkTwainEncode1e6/default-32                                        2305542        2251826        -2.33%
BenchmarkTwainEncode1e6/better-32                                         4023332        3904791        -2.95%
BenchmarkTwainEncode1e6/best-32                                           53576955       54518800       +1.76%
BenchmarkTwainEncode1e6/snappy-default-32                                 2300992        2179567        -5.28%
BenchmarkTwainEncode1e6/snappy-better-32                                  3938222        3879487        -1.49%
BenchmarkTwainEncode1e6/snappy-best-32                                    30057235       30808837       +2.50%
BenchmarkTwainEncode1e6/snappy-ref-noasm-32                               4890432        4866709        -0.49%
BenchmarkTwainEncode1e7/default-32                                        23717990       22395276       -5.58%
BenchmarkTwainEncode1e7/better-32                                         42845300       42508469       -0.79%
BenchmarkTwainEncode1e7/best-32                                           1113607500     1111374800     -0.20%
BenchmarkTwainEncode1e7/snappy-default-32                                 23335686       22315622       -4.37%
BenchmarkTwainEncode1e7/snappy-better-32                                  42227550       41652074       -1.36%
BenchmarkTwainEncode1e7/snappy-best-32                                    410723367      421980100      +2.74%
BenchmarkTwainEncode1e7/snappy-ref-noasm-32                               51418814       51197981       -0.43%
```
@klauspost klauspost merged commit a1a9cfc into master Feb 1, 2022
@klauspost klauspost deleted the s2-update-matching branch February 1, 2022 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants