Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

math: replace exported variables with runtime.KeepAlive in benchmarks #33394

Closed
wants to merge 1 commit into from
Closed

math: replace exported variables with runtime.KeepAlive in benchmarks #33394

wants to merge 1 commit into from

Conversation

nsajko
Copy link
Contributor

@nsajko nsajko commented Jul 31, 2019

Use runtime.KeepAlive for preventing benchmarked code from being
eliminated as "dead" code (by marking its return values as live),
instead of relying on "package level exported sink variable" hacks that
are far from obvious to those not familiar with the internals of the Go
compilers and also do not seem to be robust in the long term (I mean,
it seems like an update to one of the Go implementations could easily
break benchmarks by doing dead code elimination of benchmarked code
regardless of those sink variables).

The discussion that led to this patch happening:
#33325

Also changed the BenchmarkDim benchmark a bit since it seemed very
possible it would be optimized out otherwise (because subtracting zero
from any number is a non-operation).

Some other benchmarks other than BenchmarkDim also experienced speedups
or slowdowns, which is quite mysterious to me in some cases (although I
have not checked the assembly) because KeepAlive is supposed to be
intrinsified?

Benchstat output, benchmarked with tip on Linux AMD64 on Intel i5-8300H:

name old time/op new time/op delta
Acos-8 13.6ns ± 0% 13.8ns ± 0% +1.72% (p=0.000 n=13+15)
Acosh-8 18.7ns ± 1% 19.1ns ± 0% +2.37% (p=0.000 n=14+13)
Asin-8 10.7ns ± 1% 11.0ns ± 1% +3.33% (p=0.000 n=14+15)
Asinh-8 24.6ns ± 1% 25.2ns ± 0% +2.44% (p=0.000 n=15+15)
Atan-8 6.68ns ± 0% 6.96ns ± 0% +4.24% (p=0.000 n=14+15)
Atanh-8 20.9ns ± 0% 21.1ns ± 0% +0.89% (p=0.000 n=15+15)
Atan2-8 12.4ns ± 1% 12.7ns ± 0% +2.36% (p=0.000 n=15+15)
Cbrt-8 10.9ns ± 0% 11.9ns ±11% +9.54% (p=0.000 n=13+15)
Ceil-8 2.01ns ± 0% 2.01ns ± 0% ~ (all equal)
Copysign-8 0.75ns ± 1% 0.75ns ± 0% ~ (p=0.902 n=14+16)
Cos-8 9.76ns ± 0% 9.88ns ± 0% +1.23% (p=0.000 n=15+13)
Cosh-8 13.7ns ± 2% 13.8ns ± 0% +0.83% (p=0.001 n=15+13)
Erf-8 6.86ns ± 0% 7.19ns ± 0% +4.84% (p=0.000 n=14+14)
Erfc-8 8.10ns ± 1% 8.29ns ± 1% +2.31% (p=0.000 n=15+16)
Erfinv-8 8.78ns ± 0% 9.08ns ± 0% +3.33% (p=0.000 n=14+14)
Erfcinv-8 8.95ns ± 0% 9.07ns ± 0% +1.39% (p=0.000 n=14+15)
Exp-8 8.56ns ± 0% 8.60ns ± 0% +0.48% (p=0.000 n=14+15)
ExpGo-8 22.5ns ± 1% 22.3ns ± 0% -0.54% (p=0.000 n=16+15)
Expm1-8 11.8ns ± 0% 12.1ns ± 0% +2.54% (p=0.000 n=16+14)
Exp2-8 20.8ns ± 0% 21.3ns ± 1% +2.23% (p=0.000 n=14+14)
Exp2Go-8 20.9ns ± 0% 21.7ns ± 1% +3.88% (p=0.000 n=14+15)
Abs-8 0.38ns ± 0% 0.38ns ± 0% ~ (p=0.082 n=15+13)
Dim-8 0.63ns ± 0% 1.21ns ±47% +92.18% (p=0.000 n=16+16)
Floor-8 2.01ns ± 1% 2.01ns ± 0% -0.14% (p=0.000 n=14+16)
Max-8 2.44ns ±12% 2.55ns ±31% ~ (p=0.187 n=16+14)
Min-8 3.02ns ±49% 3.41ns ±44% ~ (p=0.115 n=16+16)
Mod-8 41.9ns ±32% 37.4ns ±34% ~ (p=0.115 n=16+16)
Frexp-8 5.17ns ±39% 4.31ns ±26% ~ (p=0.116 n=16+14)
Gamma-8 11.5ns ±54% 10.8ns ±49% ~ (p=0.291 n=16+16)
Hypot-8 3.93ns ±23% 3.22ns ± 3% ~ (p=0.087 n=16+13)
HypotGo-8 5.92ns ±27% 6.25ns ±45% ~ (p=0.830 n=16+16)
Ilogb-8 3.59ns ±49% 3.73ns ±46% +4.01% (p=0.016 n=16+16)
J0-8 50.0ns ± 0% 59.9ns ±47% +19.74% (p=0.000 n=13+16)
J1-8 50.1ns ± 0% 50.8ns ± 0% +1.40% (p=0.000 n=13+13)
Jn-8 108ns ± 0% 108ns ± 0% ~ (all equal)
Ldexp-8 5.67ns ±15% 5.56ns ± 0% ~ (p=0.831 n=13+15)
Lgamma-8 14.3ns ±52% 13.5ns ±49% ~ (p=0.316 n=16+16)
Log-8 8.95ns ± 1% 10.24ns ±46% +14.49% (p=0.000 n=13+15)
Logb-8 4.00ns ±48% 3.92ns ±50% ~ (p=0.387 n=16+15)
Log1p-8 16.7ns ±43% 13.6ns ± 1% ~ (p=0.241 n=16+13)
Log10-8 11.2ns ± 2% 11.6ns ± 0% +3.43% (p=0.000 n=13+14)
Log2-8 5.90ns ±48% 6.32ns ±45% ~ (p=0.130 n=15+16)
Modf-8 4.73ns ±31% 4.70ns ±28% ~ (p=0.993 n=16+16)
Nextafter32-8 4.66ns ±37% 4.55ns ±31% ~ (p=0.299 n=16+16)
Nextafter64-8 3.37ns ±15% 3.95ns ±44% ~ (p=0.133 n=14+16)
PowInt-8 32.4ns ±34% 31.9ns ±37% ~ (p=0.830 n=16+16)
PowFrac-8 66.7ns ± 0% 77.3ns ±50% ~ (p=0.160 n=13+16)
Pow10Pos-8 1.01ns ± 1% 1.26ns ±21% +25.36% (p=0.000 n=16+15)
Pow10Neg-8 1.54ns ±41% 1.74ns ±28% ~ (p=0.069 n=16+16)
Round-8 2.53ns ±39% 2.47ns ±41% ~ (p=0.229 n=16+16)
RoundToEven-8 0.57ns ± 1% 0.63ns ± 2% +11.42% (p=0.000 n=13+13)
Remainder-8 27.4ns ± 1% 26.9ns ± 0% -1.80% (p=0.000 n=13+13)
Signbit-8 0.50ns ± 1% 0.75ns ± 0% +49.28% (p=0.000 n=13+13)
Sin-8 9.09ns ± 0% 9.49ns ±19% +4.45% (p=0.000 n=15+13)
Sincos-8 11.5ns ± 1% 11.7ns ± 0% +1.73% (p=0.000 n=15+13)
Sinh-8 13.7ns ± 3% 13.9ns ± 0% ~ (p=0.628 n=16+15)
SqrtIndirect-8 2.26ns ± 1% 2.45ns ±32% ~ (p=0.075 n=14+16)
SqrtLatency-8 3.27ns ± 1% 3.51ns ±46% ~ (p=0.490 n=14+14)
SqrtIndirectLatency-8 5.78ns ± 0% 6.03ns ± 0% +4.33% (p=0.000 n=16+16)
SqrtGoLatency-8 34.7ns ± 0% 35.0ns ± 0% +0.88% (p=0.000 n=16+16)
SqrtPrime-8 2.53µs ± 0% 2.56µs ± 0% +1.34% (p=0.000 n=16+16)
Tan-8 9.59ns ± 0% 9.88ns ± 0% +3.04% (p=0.000 n=16+15)
Tanh-8 14.3ns ± 3% 14.5ns ± 0% +1.58% (p=0.001 n=16+13)
Trunc-8 2.01ns ± 0% 2.01ns ± 0% ~ (all equal)
Y0-8 49.0ns ± 1% 49.7ns ± 0% +1.56% (p=0.000 n=16+15)
Y1-8 49.0ns ± 1% 49.7ns ± 0% +1.28% (p=0.000 n=16+15)
Yn-8 106ns ± 0% 106ns ± 0% ~ (all equal)
Float64bits-8 0.50ns ± 0% 0.30ns ± 1% -40.17% (p=0.000 n=16+16)
Float64frombits-8 0.50ns ± 0% 0.28ns ± 0% -43.82% (p=0.000 n=15+16)
Float32bits-8 0.25ns ± 0% 0.28ns ± 0% +12.25% (p=0.000 n=16+12)
Float32frombits-8 0.25ns ± 0% 0.25ns ± 0% ~ (p=0.704 n=16+16)

Use runtime.KeepAlive for preventing benchmarked code from being
eliminated as "dead" code (by marking its return values as live),
instead of relying on "package level exported sink variable" hacks that
are far from obvious to those not familiar with the internals of the Go
compilers and also do not seem to be robust in the long term (I mean,
it seems like an update to one of the Go implementations could easily
break benchmarks by doing dead code elimination of benchmarked code
regardless of those sink variables).

The discussion that led to this patch happening:
#33325

Also changed the BenchmarkDim benchmark a bit since it seemed very
possible it would be optimized out otherwise (because subtracting zero
from any number is a non-operation).

Some other benchmarks other than BenchmarkDim also experienced speedups
or slowdowns, which is quite mysterious to me in some cases (although I
have not checked the assembly) because KeepAlive is supposed to be
intrinsified?

Benchstat output, benchmarked with tip on Linux AMD64 on Intel i5-8300H:

name                   old time/op  new time/op   delta
Acos-8                 13.6ns ± 0%   13.8ns ± 0%   +1.72%  (p=0.000 n=13+15)
Acosh-8                18.7ns ± 1%   19.1ns ± 0%   +2.37%  (p=0.000 n=14+13)
Asin-8                 10.7ns ± 1%   11.0ns ± 1%   +3.33%  (p=0.000 n=14+15)
Asinh-8                24.6ns ± 1%   25.2ns ± 0%   +2.44%  (p=0.000 n=15+15)
Atan-8                 6.68ns ± 0%   6.96ns ± 0%   +4.24%  (p=0.000 n=14+15)
Atanh-8                20.9ns ± 0%   21.1ns ± 0%   +0.89%  (p=0.000 n=15+15)
Atan2-8                12.4ns ± 1%   12.7ns ± 0%   +2.36%  (p=0.000 n=15+15)
Cbrt-8                 10.9ns ± 0%   11.9ns ±11%   +9.54%  (p=0.000 n=13+15)
Ceil-8                 2.01ns ± 0%   2.01ns ± 0%     ~     (all equal)
Copysign-8             0.75ns ± 1%   0.75ns ± 0%     ~     (p=0.902 n=14+16)
Cos-8                  9.76ns ± 0%   9.88ns ± 0%   +1.23%  (p=0.000 n=15+13)
Cosh-8                 13.7ns ± 2%   13.8ns ± 0%   +0.83%  (p=0.001 n=15+13)
Erf-8                  6.86ns ± 0%   7.19ns ± 0%   +4.84%  (p=0.000 n=14+14)
Erfc-8                 8.10ns ± 1%   8.29ns ± 1%   +2.31%  (p=0.000 n=15+16)
Erfinv-8               8.78ns ± 0%   9.08ns ± 0%   +3.33%  (p=0.000 n=14+14)
Erfcinv-8              8.95ns ± 0%   9.07ns ± 0%   +1.39%  (p=0.000 n=14+15)
Exp-8                  8.56ns ± 0%   8.60ns ± 0%   +0.48%  (p=0.000 n=14+15)
ExpGo-8                22.5ns ± 1%   22.3ns ± 0%   -0.54%  (p=0.000 n=16+15)
Expm1-8                11.8ns ± 0%   12.1ns ± 0%   +2.54%  (p=0.000 n=16+14)
Exp2-8                 20.8ns ± 0%   21.3ns ± 1%   +2.23%  (p=0.000 n=14+14)
Exp2Go-8               20.9ns ± 0%   21.7ns ± 1%   +3.88%  (p=0.000 n=14+15)
Abs-8                  0.38ns ± 0%   0.38ns ± 0%     ~     (p=0.082 n=15+13)
Dim-8                  0.63ns ± 0%   1.21ns ±47%  +92.18%  (p=0.000 n=16+16)
Floor-8                2.01ns ± 1%   2.01ns ± 0%   -0.14%  (p=0.000 n=14+16)
Max-8                  2.44ns ±12%   2.55ns ±31%     ~     (p=0.187 n=16+14)
Min-8                  3.02ns ±49%   3.41ns ±44%     ~     (p=0.115 n=16+16)
Mod-8                  41.9ns ±32%   37.4ns ±34%     ~     (p=0.115 n=16+16)
Frexp-8                5.17ns ±39%   4.31ns ±26%     ~     (p=0.116 n=16+14)
Gamma-8                11.5ns ±54%   10.8ns ±49%     ~     (p=0.291 n=16+16)
Hypot-8                3.93ns ±23%   3.22ns ± 3%     ~     (p=0.087 n=16+13)
HypotGo-8              5.92ns ±27%   6.25ns ±45%     ~     (p=0.830 n=16+16)
Ilogb-8                3.59ns ±49%   3.73ns ±46%   +4.01%  (p=0.016 n=16+16)
J0-8                   50.0ns ± 0%   59.9ns ±47%  +19.74%  (p=0.000 n=13+16)
J1-8                   50.1ns ± 0%   50.8ns ± 0%   +1.40%  (p=0.000 n=13+13)
Jn-8                    108ns ± 0%    108ns ± 0%     ~     (all equal)
Ldexp-8                5.67ns ±15%   5.56ns ± 0%     ~     (p=0.831 n=13+15)
Lgamma-8               14.3ns ±52%   13.5ns ±49%     ~     (p=0.316 n=16+16)
Log-8                  8.95ns ± 1%  10.24ns ±46%  +14.49%  (p=0.000 n=13+15)
Logb-8                 4.00ns ±48%   3.92ns ±50%     ~     (p=0.387 n=16+15)
Log1p-8                16.7ns ±43%   13.6ns ± 1%     ~     (p=0.241 n=16+13)
Log10-8                11.2ns ± 2%   11.6ns ± 0%   +3.43%  (p=0.000 n=13+14)
Log2-8                 5.90ns ±48%   6.32ns ±45%     ~     (p=0.130 n=15+16)
Modf-8                 4.73ns ±31%   4.70ns ±28%     ~     (p=0.993 n=16+16)
Nextafter32-8          4.66ns ±37%   4.55ns ±31%     ~     (p=0.299 n=16+16)
Nextafter64-8          3.37ns ±15%   3.95ns ±44%     ~     (p=0.133 n=14+16)
PowInt-8               32.4ns ±34%   31.9ns ±37%     ~     (p=0.830 n=16+16)
PowFrac-8              66.7ns ± 0%   77.3ns ±50%     ~     (p=0.160 n=13+16)
Pow10Pos-8             1.01ns ± 1%   1.26ns ±21%  +25.36%  (p=0.000 n=16+15)
Pow10Neg-8             1.54ns ±41%   1.74ns ±28%     ~     (p=0.069 n=16+16)
Round-8                2.53ns ±39%   2.47ns ±41%     ~     (p=0.229 n=16+16)
RoundToEven-8          0.57ns ± 1%   0.63ns ± 2%  +11.42%  (p=0.000 n=13+13)
Remainder-8            27.4ns ± 1%   26.9ns ± 0%   -1.80%  (p=0.000 n=13+13)
Signbit-8              0.50ns ± 1%   0.75ns ± 0%  +49.28%  (p=0.000 n=13+13)
Sin-8                  9.09ns ± 0%   9.49ns ±19%   +4.45%  (p=0.000 n=15+13)
Sincos-8               11.5ns ± 1%   11.7ns ± 0%   +1.73%  (p=0.000 n=15+13)
Sinh-8                 13.7ns ± 3%   13.9ns ± 0%     ~     (p=0.628 n=16+15)
SqrtIndirect-8         2.26ns ± 1%   2.45ns ±32%     ~     (p=0.075 n=14+16)
SqrtLatency-8          3.27ns ± 1%   3.51ns ±46%     ~     (p=0.490 n=14+14)
SqrtIndirectLatency-8  5.78ns ± 0%   6.03ns ± 0%   +4.33%  (p=0.000 n=16+16)
SqrtGoLatency-8        34.7ns ± 0%   35.0ns ± 0%   +0.88%  (p=0.000 n=16+16)
SqrtPrime-8            2.53µs ± 0%   2.56µs ± 0%   +1.34%  (p=0.000 n=16+16)
Tan-8                  9.59ns ± 0%   9.88ns ± 0%   +3.04%  (p=0.000 n=16+15)
Tanh-8                 14.3ns ± 3%   14.5ns ± 0%   +1.58%  (p=0.001 n=16+13)
Trunc-8                2.01ns ± 0%   2.01ns ± 0%     ~     (all equal)
Y0-8                   49.0ns ± 1%   49.7ns ± 0%   +1.56%  (p=0.000 n=16+15)
Y1-8                   49.0ns ± 1%   49.7ns ± 0%   +1.28%  (p=0.000 n=16+15)
Yn-8                    106ns ± 0%    106ns ± 0%     ~     (all equal)
Float64bits-8          0.50ns ± 0%   0.30ns ± 1%  -40.17%  (p=0.000 n=16+16)
Float64frombits-8      0.50ns ± 0%   0.28ns ± 0%  -43.82%  (p=0.000 n=15+16)
Float32bits-8          0.25ns ± 0%   0.28ns ± 0%  +12.25%  (p=0.000 n=16+12)
Float32frombits-8      0.25ns ± 0%   0.25ns ± 0%     ~     (p=0.704 n=16+16)
@googlebot googlebot added the cla: yes Used by googlebot to label PRs as having a valid CLA. The text of this label should not change. label Jul 31, 2019
@gopherbot
Copy link
Contributor

This PR (HEAD: 54358ad) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/188437 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@gopherbot
Copy link
Contributor

Message from Brad Fitzpatrick:

Patch Set 1:

Look at memory allocations in the benchmarks too. I suspect KeepAlive is allocating, boxing the values into an interface{}.

You might try doing:

x := Atanh(tt.in)
runtime.KeepAlive(&x)

... to use a pointer to KeepAlive instead. Maybe the compiler deals with that better without the allocation.

File a compiler bug, though.


Please don’t reply on this GitHub thread. Visit golang.org/cl/188437.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Neven Sajko:

Patch Set 1:

Patch Set 1:

Look at memory allocations in the benchmarks too. I suspect KeepAlive is allocating, boxing the values into an interface{}.

You might try doing:

x := Atanh(tt.in)
runtime.KeepAlive(&x)

... to use a pointer to KeepAlive instead. Maybe the compiler deals with that better without the allocation.

File a compiler bug, though.

There were no allocations, but there does seem to be an issue, so I filed one on Github: #33442


Please don’t reply on this GitHub thread. Visit golang.org/cl/188437.
After addressing review feedback, remember to publish your drafts!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Used by googlebot to label PRs as having a valid CLA. The text of this label should not change.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants