-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
math: replace exported variables with runtime.KeepAlive in benchmarks #33394
math: replace exported variables with runtime.KeepAlive in benchmarks #33394
Conversation
Use runtime.KeepAlive for preventing benchmarked code from being eliminated as "dead" code (by marking its return values as live), instead of relying on "package level exported sink variable" hacks that are far from obvious to those not familiar with the internals of the Go compilers and also do not seem to be robust in the long term (I mean, it seems like an update to one of the Go implementations could easily break benchmarks by doing dead code elimination of benchmarked code regardless of those sink variables). The discussion that led to this patch happening: #33325 Also changed the BenchmarkDim benchmark a bit since it seemed very possible it would be optimized out otherwise (because subtracting zero from any number is a non-operation). Some other benchmarks other than BenchmarkDim also experienced speedups or slowdowns, which is quite mysterious to me in some cases (although I have not checked the assembly) because KeepAlive is supposed to be intrinsified? Benchstat output, benchmarked with tip on Linux AMD64 on Intel i5-8300H: name old time/op new time/op delta Acos-8 13.6ns ± 0% 13.8ns ± 0% +1.72% (p=0.000 n=13+15) Acosh-8 18.7ns ± 1% 19.1ns ± 0% +2.37% (p=0.000 n=14+13) Asin-8 10.7ns ± 1% 11.0ns ± 1% +3.33% (p=0.000 n=14+15) Asinh-8 24.6ns ± 1% 25.2ns ± 0% +2.44% (p=0.000 n=15+15) Atan-8 6.68ns ± 0% 6.96ns ± 0% +4.24% (p=0.000 n=14+15) Atanh-8 20.9ns ± 0% 21.1ns ± 0% +0.89% (p=0.000 n=15+15) Atan2-8 12.4ns ± 1% 12.7ns ± 0% +2.36% (p=0.000 n=15+15) Cbrt-8 10.9ns ± 0% 11.9ns ±11% +9.54% (p=0.000 n=13+15) Ceil-8 2.01ns ± 0% 2.01ns ± 0% ~ (all equal) Copysign-8 0.75ns ± 1% 0.75ns ± 0% ~ (p=0.902 n=14+16) Cos-8 9.76ns ± 0% 9.88ns ± 0% +1.23% (p=0.000 n=15+13) Cosh-8 13.7ns ± 2% 13.8ns ± 0% +0.83% (p=0.001 n=15+13) Erf-8 6.86ns ± 0% 7.19ns ± 0% +4.84% (p=0.000 n=14+14) Erfc-8 8.10ns ± 1% 8.29ns ± 1% +2.31% (p=0.000 n=15+16) Erfinv-8 8.78ns ± 0% 9.08ns ± 0% +3.33% (p=0.000 n=14+14) Erfcinv-8 8.95ns ± 0% 9.07ns ± 0% +1.39% (p=0.000 n=14+15) Exp-8 8.56ns ± 0% 8.60ns ± 0% +0.48% (p=0.000 n=14+15) ExpGo-8 22.5ns ± 1% 22.3ns ± 0% -0.54% (p=0.000 n=16+15) Expm1-8 11.8ns ± 0% 12.1ns ± 0% +2.54% (p=0.000 n=16+14) Exp2-8 20.8ns ± 0% 21.3ns ± 1% +2.23% (p=0.000 n=14+14) Exp2Go-8 20.9ns ± 0% 21.7ns ± 1% +3.88% (p=0.000 n=14+15) Abs-8 0.38ns ± 0% 0.38ns ± 0% ~ (p=0.082 n=15+13) Dim-8 0.63ns ± 0% 1.21ns ±47% +92.18% (p=0.000 n=16+16) Floor-8 2.01ns ± 1% 2.01ns ± 0% -0.14% (p=0.000 n=14+16) Max-8 2.44ns ±12% 2.55ns ±31% ~ (p=0.187 n=16+14) Min-8 3.02ns ±49% 3.41ns ±44% ~ (p=0.115 n=16+16) Mod-8 41.9ns ±32% 37.4ns ±34% ~ (p=0.115 n=16+16) Frexp-8 5.17ns ±39% 4.31ns ±26% ~ (p=0.116 n=16+14) Gamma-8 11.5ns ±54% 10.8ns ±49% ~ (p=0.291 n=16+16) Hypot-8 3.93ns ±23% 3.22ns ± 3% ~ (p=0.087 n=16+13) HypotGo-8 5.92ns ±27% 6.25ns ±45% ~ (p=0.830 n=16+16) Ilogb-8 3.59ns ±49% 3.73ns ±46% +4.01% (p=0.016 n=16+16) J0-8 50.0ns ± 0% 59.9ns ±47% +19.74% (p=0.000 n=13+16) J1-8 50.1ns ± 0% 50.8ns ± 0% +1.40% (p=0.000 n=13+13) Jn-8 108ns ± 0% 108ns ± 0% ~ (all equal) Ldexp-8 5.67ns ±15% 5.56ns ± 0% ~ (p=0.831 n=13+15) Lgamma-8 14.3ns ±52% 13.5ns ±49% ~ (p=0.316 n=16+16) Log-8 8.95ns ± 1% 10.24ns ±46% +14.49% (p=0.000 n=13+15) Logb-8 4.00ns ±48% 3.92ns ±50% ~ (p=0.387 n=16+15) Log1p-8 16.7ns ±43% 13.6ns ± 1% ~ (p=0.241 n=16+13) Log10-8 11.2ns ± 2% 11.6ns ± 0% +3.43% (p=0.000 n=13+14) Log2-8 5.90ns ±48% 6.32ns ±45% ~ (p=0.130 n=15+16) Modf-8 4.73ns ±31% 4.70ns ±28% ~ (p=0.993 n=16+16) Nextafter32-8 4.66ns ±37% 4.55ns ±31% ~ (p=0.299 n=16+16) Nextafter64-8 3.37ns ±15% 3.95ns ±44% ~ (p=0.133 n=14+16) PowInt-8 32.4ns ±34% 31.9ns ±37% ~ (p=0.830 n=16+16) PowFrac-8 66.7ns ± 0% 77.3ns ±50% ~ (p=0.160 n=13+16) Pow10Pos-8 1.01ns ± 1% 1.26ns ±21% +25.36% (p=0.000 n=16+15) Pow10Neg-8 1.54ns ±41% 1.74ns ±28% ~ (p=0.069 n=16+16) Round-8 2.53ns ±39% 2.47ns ±41% ~ (p=0.229 n=16+16) RoundToEven-8 0.57ns ± 1% 0.63ns ± 2% +11.42% (p=0.000 n=13+13) Remainder-8 27.4ns ± 1% 26.9ns ± 0% -1.80% (p=0.000 n=13+13) Signbit-8 0.50ns ± 1% 0.75ns ± 0% +49.28% (p=0.000 n=13+13) Sin-8 9.09ns ± 0% 9.49ns ±19% +4.45% (p=0.000 n=15+13) Sincos-8 11.5ns ± 1% 11.7ns ± 0% +1.73% (p=0.000 n=15+13) Sinh-8 13.7ns ± 3% 13.9ns ± 0% ~ (p=0.628 n=16+15) SqrtIndirect-8 2.26ns ± 1% 2.45ns ±32% ~ (p=0.075 n=14+16) SqrtLatency-8 3.27ns ± 1% 3.51ns ±46% ~ (p=0.490 n=14+14) SqrtIndirectLatency-8 5.78ns ± 0% 6.03ns ± 0% +4.33% (p=0.000 n=16+16) SqrtGoLatency-8 34.7ns ± 0% 35.0ns ± 0% +0.88% (p=0.000 n=16+16) SqrtPrime-8 2.53µs ± 0% 2.56µs ± 0% +1.34% (p=0.000 n=16+16) Tan-8 9.59ns ± 0% 9.88ns ± 0% +3.04% (p=0.000 n=16+15) Tanh-8 14.3ns ± 3% 14.5ns ± 0% +1.58% (p=0.001 n=16+13) Trunc-8 2.01ns ± 0% 2.01ns ± 0% ~ (all equal) Y0-8 49.0ns ± 1% 49.7ns ± 0% +1.56% (p=0.000 n=16+15) Y1-8 49.0ns ± 1% 49.7ns ± 0% +1.28% (p=0.000 n=16+15) Yn-8 106ns ± 0% 106ns ± 0% ~ (all equal) Float64bits-8 0.50ns ± 0% 0.30ns ± 1% -40.17% (p=0.000 n=16+16) Float64frombits-8 0.50ns ± 0% 0.28ns ± 0% -43.82% (p=0.000 n=15+16) Float32bits-8 0.25ns ± 0% 0.28ns ± 0% +12.25% (p=0.000 n=16+12) Float32frombits-8 0.25ns ± 0% 0.25ns ± 0% ~ (p=0.704 n=16+16)
This PR (HEAD: 54358ad) has been imported to Gerrit for code review. Please visit https://go-review.googlesource.com/c/go/+/188437 to see it. Tip: You can toggle comments from me using the |
Message from Brad Fitzpatrick: Patch Set 1: Look at memory allocations in the benchmarks too. I suspect KeepAlive is allocating, boxing the values into an interface{}. You might try doing: x := Atanh(tt.in) ... to use a pointer to KeepAlive instead. Maybe the compiler deals with that better without the allocation. File a compiler bug, though. Please don’t reply on this GitHub thread. Visit golang.org/cl/188437. |
Message from Neven Sajko: Patch Set 1:
There were no allocations, but there does seem to be an issue, so I filed one on Github: #33442 Please don’t reply on this GitHub thread. Visit golang.org/cl/188437. |
53bd915
to
6139019
Compare
4a7ed1f
to
0f992b9
Compare
Use runtime.KeepAlive for preventing benchmarked code from being
eliminated as "dead" code (by marking its return values as live),
instead of relying on "package level exported sink variable" hacks that
are far from obvious to those not familiar with the internals of the Go
compilers and also do not seem to be robust in the long term (I mean,
it seems like an update to one of the Go implementations could easily
break benchmarks by doing dead code elimination of benchmarked code
regardless of those sink variables).
The discussion that led to this patch happening:
#33325
Also changed the BenchmarkDim benchmark a bit since it seemed very
possible it would be optimized out otherwise (because subtracting zero
from any number is a non-operation).
Some other benchmarks other than BenchmarkDim also experienced speedups
or slowdowns, which is quite mysterious to me in some cases (although I
have not checked the assembly) because KeepAlive is supposed to be
intrinsified?
Benchstat output, benchmarked with tip on Linux AMD64 on Intel i5-8300H:
name old time/op new time/op delta
Acos-8 13.6ns ± 0% 13.8ns ± 0% +1.72% (p=0.000 n=13+15)
Acosh-8 18.7ns ± 1% 19.1ns ± 0% +2.37% (p=0.000 n=14+13)
Asin-8 10.7ns ± 1% 11.0ns ± 1% +3.33% (p=0.000 n=14+15)
Asinh-8 24.6ns ± 1% 25.2ns ± 0% +2.44% (p=0.000 n=15+15)
Atan-8 6.68ns ± 0% 6.96ns ± 0% +4.24% (p=0.000 n=14+15)
Atanh-8 20.9ns ± 0% 21.1ns ± 0% +0.89% (p=0.000 n=15+15)
Atan2-8 12.4ns ± 1% 12.7ns ± 0% +2.36% (p=0.000 n=15+15)
Cbrt-8 10.9ns ± 0% 11.9ns ±11% +9.54% (p=0.000 n=13+15)
Ceil-8 2.01ns ± 0% 2.01ns ± 0% ~ (all equal)
Copysign-8 0.75ns ± 1% 0.75ns ± 0% ~ (p=0.902 n=14+16)
Cos-8 9.76ns ± 0% 9.88ns ± 0% +1.23% (p=0.000 n=15+13)
Cosh-8 13.7ns ± 2% 13.8ns ± 0% +0.83% (p=0.001 n=15+13)
Erf-8 6.86ns ± 0% 7.19ns ± 0% +4.84% (p=0.000 n=14+14)
Erfc-8 8.10ns ± 1% 8.29ns ± 1% +2.31% (p=0.000 n=15+16)
Erfinv-8 8.78ns ± 0% 9.08ns ± 0% +3.33% (p=0.000 n=14+14)
Erfcinv-8 8.95ns ± 0% 9.07ns ± 0% +1.39% (p=0.000 n=14+15)
Exp-8 8.56ns ± 0% 8.60ns ± 0% +0.48% (p=0.000 n=14+15)
ExpGo-8 22.5ns ± 1% 22.3ns ± 0% -0.54% (p=0.000 n=16+15)
Expm1-8 11.8ns ± 0% 12.1ns ± 0% +2.54% (p=0.000 n=16+14)
Exp2-8 20.8ns ± 0% 21.3ns ± 1% +2.23% (p=0.000 n=14+14)
Exp2Go-8 20.9ns ± 0% 21.7ns ± 1% +3.88% (p=0.000 n=14+15)
Abs-8 0.38ns ± 0% 0.38ns ± 0% ~ (p=0.082 n=15+13)
Dim-8 0.63ns ± 0% 1.21ns ±47% +92.18% (p=0.000 n=16+16)
Floor-8 2.01ns ± 1% 2.01ns ± 0% -0.14% (p=0.000 n=14+16)
Max-8 2.44ns ±12% 2.55ns ±31% ~ (p=0.187 n=16+14)
Min-8 3.02ns ±49% 3.41ns ±44% ~ (p=0.115 n=16+16)
Mod-8 41.9ns ±32% 37.4ns ±34% ~ (p=0.115 n=16+16)
Frexp-8 5.17ns ±39% 4.31ns ±26% ~ (p=0.116 n=16+14)
Gamma-8 11.5ns ±54% 10.8ns ±49% ~ (p=0.291 n=16+16)
Hypot-8 3.93ns ±23% 3.22ns ± 3% ~ (p=0.087 n=16+13)
HypotGo-8 5.92ns ±27% 6.25ns ±45% ~ (p=0.830 n=16+16)
Ilogb-8 3.59ns ±49% 3.73ns ±46% +4.01% (p=0.016 n=16+16)
J0-8 50.0ns ± 0% 59.9ns ±47% +19.74% (p=0.000 n=13+16)
J1-8 50.1ns ± 0% 50.8ns ± 0% +1.40% (p=0.000 n=13+13)
Jn-8 108ns ± 0% 108ns ± 0% ~ (all equal)
Ldexp-8 5.67ns ±15% 5.56ns ± 0% ~ (p=0.831 n=13+15)
Lgamma-8 14.3ns ±52% 13.5ns ±49% ~ (p=0.316 n=16+16)
Log-8 8.95ns ± 1% 10.24ns ±46% +14.49% (p=0.000 n=13+15)
Logb-8 4.00ns ±48% 3.92ns ±50% ~ (p=0.387 n=16+15)
Log1p-8 16.7ns ±43% 13.6ns ± 1% ~ (p=0.241 n=16+13)
Log10-8 11.2ns ± 2% 11.6ns ± 0% +3.43% (p=0.000 n=13+14)
Log2-8 5.90ns ±48% 6.32ns ±45% ~ (p=0.130 n=15+16)
Modf-8 4.73ns ±31% 4.70ns ±28% ~ (p=0.993 n=16+16)
Nextafter32-8 4.66ns ±37% 4.55ns ±31% ~ (p=0.299 n=16+16)
Nextafter64-8 3.37ns ±15% 3.95ns ±44% ~ (p=0.133 n=14+16)
PowInt-8 32.4ns ±34% 31.9ns ±37% ~ (p=0.830 n=16+16)
PowFrac-8 66.7ns ± 0% 77.3ns ±50% ~ (p=0.160 n=13+16)
Pow10Pos-8 1.01ns ± 1% 1.26ns ±21% +25.36% (p=0.000 n=16+15)
Pow10Neg-8 1.54ns ±41% 1.74ns ±28% ~ (p=0.069 n=16+16)
Round-8 2.53ns ±39% 2.47ns ±41% ~ (p=0.229 n=16+16)
RoundToEven-8 0.57ns ± 1% 0.63ns ± 2% +11.42% (p=0.000 n=13+13)
Remainder-8 27.4ns ± 1% 26.9ns ± 0% -1.80% (p=0.000 n=13+13)
Signbit-8 0.50ns ± 1% 0.75ns ± 0% +49.28% (p=0.000 n=13+13)
Sin-8 9.09ns ± 0% 9.49ns ±19% +4.45% (p=0.000 n=15+13)
Sincos-8 11.5ns ± 1% 11.7ns ± 0% +1.73% (p=0.000 n=15+13)
Sinh-8 13.7ns ± 3% 13.9ns ± 0% ~ (p=0.628 n=16+15)
SqrtIndirect-8 2.26ns ± 1% 2.45ns ±32% ~ (p=0.075 n=14+16)
SqrtLatency-8 3.27ns ± 1% 3.51ns ±46% ~ (p=0.490 n=14+14)
SqrtIndirectLatency-8 5.78ns ± 0% 6.03ns ± 0% +4.33% (p=0.000 n=16+16)
SqrtGoLatency-8 34.7ns ± 0% 35.0ns ± 0% +0.88% (p=0.000 n=16+16)
SqrtPrime-8 2.53µs ± 0% 2.56µs ± 0% +1.34% (p=0.000 n=16+16)
Tan-8 9.59ns ± 0% 9.88ns ± 0% +3.04% (p=0.000 n=16+15)
Tanh-8 14.3ns ± 3% 14.5ns ± 0% +1.58% (p=0.001 n=16+13)
Trunc-8 2.01ns ± 0% 2.01ns ± 0% ~ (all equal)
Y0-8 49.0ns ± 1% 49.7ns ± 0% +1.56% (p=0.000 n=16+15)
Y1-8 49.0ns ± 1% 49.7ns ± 0% +1.28% (p=0.000 n=16+15)
Yn-8 106ns ± 0% 106ns ± 0% ~ (all equal)
Float64bits-8 0.50ns ± 0% 0.30ns ± 1% -40.17% (p=0.000 n=16+16)
Float64frombits-8 0.50ns ± 0% 0.28ns ± 0% -43.82% (p=0.000 n=15+16)
Float32bits-8 0.25ns ± 0% 0.28ns ± 0% +12.25% (p=0.000 n=16+12)
Float32frombits-8 0.25ns ± 0% 0.25ns ± 0% ~ (p=0.704 n=16+16)