-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: new goroutines can spend excessive time in morestack #18138
Comments
/cc @aclements @randall77 |
We've seen this a few times now. I'm not sure what the right answer is. My best thought so far is that the runtime could keep track of when particular go statements always lead to stack growth right away (for some value of "right away" and "always") and learn to start goroutines from that site with a larger stack. Of course, it would be hard to make this behavior predictable, but perhaps it would still be less surprising than the current behavior. If the runtime did learn to start a goroutine with a larger stack, it would still need a signal to learn if the stack should get smaller again, but we could do that efficiently by allocating the larger stack but setting the stack bounds to something smaller. Then the runtime could still observe whether or not the stack needs to grow, but the actual growth would be basically free until it reached the size of the allocation. @randall77, thoughts, ideas? /cc @RLH |
I like @petermattis idea of being able to hint stack size on a per goroutine basis, although this implies developers have the knowhow to identify and provide size estimates accurately. Could this be done with a compiler directive? |
We don't want compiler directives in code. We have some used by the runtime out of necessity, but they're gross. Go prefers simplicity over tons of knobs. |
Yes, please just make my code magically faster as you've been doing for the last several Go releases. |
I generally agree with not having compiler directives ... magic is nice, although they (compiler directives) do exist even in go. It's an interesting opportunity either way you decide. |
@bradfitz, your comment prompted me to look for the go guiding principles ( https://golang.org/doc/faq#principles). Thanks @adg as well for nicely worded principles. |
CL https://golang.org/cl/45142 mentions this issue. |
@petermattis (or anyone who has a good reproducer for this), would you be able to try https://go-review.googlesource.com/45142? It's a trivial hack, but it might actually do the trick. I haven't benchmarked it on anything, so it may also slow things down. |
@aclements I'll try and test either tomorrow or next week. |
@aclements Applying that patch to go1.8.3 resulted in no benefit (this is with the
Surprising to me this didn't have any effect. Compare this to the
|
CL 43150 might help a little here. |
Sorry, I made a silly mistake in CL 45142. Would you mind trying the new version of that CL? |
With your updated patch against go-tip (f363817) there is an improvement:
But the improvement is still not as good as the
There is a little more performance if we increase the initial stack size to 32 KB:
Interestingly, all of these timings are lower than with go1.8.3. |
Nothing to see here. This appears to be due to a change on our code between what I tested earlier today and now. |
I did some more testing of this patch and the performance improvements carry over to production settings. |
It is early in the 1.10 cycle and wanted to bring this issue forward again. See cockroachdb/cockroach#17242 for a graph showing the benefit of a larger initial stack size. |
Is there any update on this issue? A larger initial goroutine stack size provides a nice performance boost for our system. |
Just chiming in that I'm still working around this issue in my performance-sensitive applications manually for the same reasons mentioned above, an average over the entire application is not reflective of the needs of my critical path. |
We have identified an issue where zap.Any can cause performance degradation due to stack size. This is apparently cased by the compiler assigning 4.8kb (a zap.Field per arm of the switch statement) for zap.Any on stack. This can result in an unnecessary runtime.newstack+runtime.copystack. A github issue against Go language is pending. This can be particularly bad if `zap.Any` was to be used in a new goroutine, since the default goroutine sizes can be as low as 2kb (it can vary depending on average stack size - see golang/go#18138). This is an alternative to #1301, @cdvr and me were talking about this, and he inspired this idea with the closure. By using a function and a closure we're able to reduce the size and remove the degradation. At least on my laptop, this change result in a new performance gain, as all benchmarks show reduced time. Before: ``` ❯ go test -bench BenchmarkAny -benchmem -run Any goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAnyInGoroutine/any-12 3977013 316.1 ns/op 64 B/op 1 allocs/op BenchmarkAnyInGoroutine/int-12 4172178 289.6 ns/op 64 B/op 1 allocs/op BenchmarkAnyInGoroutine/goroutine-12 5018606 253.1 ns/op 0 B/op 0 allocs/op BenchmarkAnyInGoroutine/int-in-go-12 2167634 561.5 ns/op 88 B/op 2 allocs/op BenchmarkAnyInGoroutine/any-in-go-12 1875784 637.6 ns/op 88 B/op 2 allocs/op BenchmarkAnyInGoroutine/int-in-go-with-stack-12 2500544 446.7 ns/op 88 B/op 2 allocs/op BenchmarkAnyInGoroutine/any-in-go-with-stack-12 1769258 653.0 ns/op 88 B/op 2 allocs/op PASS ``` After: ``` ❯ go test -bench BenchmarkAny -benchmem -run Any goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAnyInGoroutine/any-12 3967261 288.2 ns/op 64 B/op 1 allocs/op BenchmarkAnyInGoroutine/int-12 4571604 260.6 ns/op 64 B/op 1 allocs/op BenchmarkAnyInGoroutine/goroutine-12 5481904 216.4 ns/op 0 B/op 0 allocs/op BenchmarkAnyInGoroutine/int-in-go-12 2097002 583.1 ns/op 88 B/op 2 allocs/op BenchmarkAnyInGoroutine/any-in-go-12 2164167 551.6 ns/op 88 B/op 2 allocs/op BenchmarkAnyInGoroutine/int-in-go-with-stack-12 2989617 378.2 ns/op 88 B/op 2 allocs/op BenchmarkAnyInGoroutine/any-in-go-with-stack-12 3123987 381.8 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 11.486s ``` 10 runs. ``` ❯ benchstat before.txt after.txt goos: darwin goarch: arm64 pkg: go.uber.org/zap │ before.txt │ after.txt │ │ sec/op │ sec/op vs base │ AnyInGoroutine/any-12 305.2n ± 3% 290.0n ± 1% -5.00% (p=0.000 n=10) AnyInGoroutine/int-12 288.0n ± 0% 265.5n ± 1% -7.85% (p=0.000 n=10) AnyInGoroutine/goroutine-12 218.3n ± 6% 216.9n ± 2% ~ (p=0.469 n=10) AnyInGoroutine/int-in-go-12 592.7n ± 2% 578.5n ± 0% -2.40% (p=0.001 n=10) AnyInGoroutine/any-in-go-12 666.5n ± 4% 557.8n ± 1% -16.30% (p=0.000 n=10) AnyInGoroutine/int-in-go-with-stack-12 474.4n ± 4% 470.3n ± 6% ~ (p=0.631 n=10) AnyInGoroutine/any-in-go-with-stack-12 617.6n ± 4% 475.9n ± 3% -22.95% (p=0.000 n=10) geomean 417.8n 382.8n -8.36% │ before.txt │ after.txt │ │ B/op │ B/op vs base │ AnyInGoroutine/any-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/goroutine-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) AnyInGoroutine/any-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/any-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean │ before.txt │ after.txt │ │ allocs/op │ allocs/op vs base │ AnyInGoroutine/any-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/goroutine-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/any-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/any-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean ```
We have identified an issue where zap.Any can cause performance degradation due to stack size. This is apparently cased by the compiler assigning 4.8kb (a zap.Field per arm of the switch statement) for zap.Any on stack. This can result in an unnecessary runtime.newstack+runtime.copystack. A github issue against Go language is pending. This can be particularly bad if `zap.Any` was to be used in a new goroutine, since the default goroutine sizes can be as low as 2kb (it can vary depending on average stack size - see golang/go#18138). This is an alternative to #1301, @cdvr and me were talking about this, and he inspired this idea with the closure. By using a function and a closure we're able to reduce the size and remove the degradation. At least on my laptop, this change result in a new performance gain, as all benchmarks show reduced time. 10 runs. ``` ❯ benchstat ~/before.txt ~/after-after.txt goos: darwin goarch: arm64 pkg: go.uber.org/zap │ /Users/pawel/before.txt │ /Users/pawel/after-after.txt │ │ sec/op │ sec/op vs base │ AnyInGoroutine/any-12 305.2n ± 3% 297.0n ± 0% ~ (p=0.085 n=10) AnyInGoroutine/int-12 288.0n ± 0% 270.1n ± 1% -6.25% (p=0.000 n=10) AnyInGoroutine/goroutine-12 218.3n ± 6% 209.5n ± 5% -4.05% (p=0.015 n=10) AnyInGoroutine/int-in-go-12 592.7n ± 2% 573.9n ± 2% -3.17% (p=0.000 n=10) AnyInGoroutine/any-in-go-12 666.5n ± 4% 552.3n ± 1% -17.13% (p=0.000 n=10) AnyInGoroutine/int-in-go-with-stack-12 474.4n ± 4% 459.4n ± 6% ~ (p=0.447 n=10+9) AnyInGoroutine/any-in-go-with-stack-12 617.6n ± 4% 468.8n ± 4% -24.09% (p=0.000 n=10) geomean 417.8n 380.1n -9.01% │ /Users/pawel/before.txt │ /Users/pawel/after-after.txt │ │ B/op │ B/op vs base │ AnyInGoroutine/any-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/goroutine-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) AnyInGoroutine/any-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) AnyInGoroutine/int-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10+9) ¹ AnyInGoroutine/any-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean │ /Users/pawel/before.txt │ /Users/pawel/after-after.txt │ │ allocs/op │ allocs/op vs base │ AnyInGoroutine/any-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/goroutine-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/any-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10+9) ¹ AnyInGoroutine/any-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean ```
We have identified an issue where zap.Any can cause performance degradation due to stack size. This is apparently cased by the compiler assigning 4.8kb (a zap.Field per arm of the switch statement) for zap.Any on stack. This can result in an unnecessary runtime.newstack+runtime.copystack. A github issue against Go language is pending. This can be particularly bad if `zap.Any` was to be used in a new goroutine, since the default goroutine sizes can be as low as 2kb (it can vary depending on average stack size - see golang/go#18138). This is an alternative to #1301, @cdvr and me were talking about this, and he inspired this idea with the closure. By using a function and a closure we're able to reduce the size and remove the degradation. At least on my laptop, this change result in a new performance gain, as all benchmarks show reduced time. 10 runs. ``` ❯ benchstat ~/before.txt ~/after-after.txt goos: darwin goarch: arm64 pkg: go.uber.org/zap │ /Users/pawel/before.txt │ /Users/pawel/after-after.txt │ │ sec/op │ sec/op vs base │ AnyInGoroutine/any-12 305.2n ± 3% 297.0n ± 0% ~ (p=0.085 n=10) AnyInGoroutine/int-12 288.0n ± 0% 270.1n ± 1% -6.25% (p=0.000 n=10) AnyInGoroutine/goroutine-12 218.3n ± 6% 209.5n ± 5% -4.05% (p=0.015 n=10) AnyInGoroutine/int-in-go-12 592.7n ± 2% 573.9n ± 2% -3.17% (p=0.000 n=10) AnyInGoroutine/any-in-go-12 666.5n ± 4% 552.3n ± 1% -17.13% (p=0.000 n=10) AnyInGoroutine/int-in-go-with-stack-12 474.4n ± 4% 459.4n ± 6% ~ (p=0.447 n=10+9) AnyInGoroutine/any-in-go-with-stack-12 617.6n ± 4% 468.8n ± 4% -24.09% (p=0.000 n=10) geomean 417.8n 380.1n -9.01% │ /Users/pawel/before.txt │ /Users/pawel/after-after.txt │ │ B/op │ B/op vs base │ AnyInGoroutine/any-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/goroutine-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) AnyInGoroutine/any-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) AnyInGoroutine/int-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10+9) ¹ AnyInGoroutine/any-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean │ /Users/pawel/before.txt │ /Users/pawel/after-after.txt │ │ allocs/op │ allocs/op vs base │ AnyInGoroutine/any-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/goroutine-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/any-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10+9) ¹ AnyInGoroutine/any-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean ```
We have identified an issue where zap.Any can cause performance degradation due to stack size. This is apparently cased by the compiler assigning 4.8kb (a zap.Field per arm of the switch statement) for zap.Any on stack. This can result in an unnecessary runtime.newstack+runtime.copystack. A github issue against Go language is pending. This can be particularly bad if `zap.Any` was to be used in a new goroutine, since the default goroutine sizes can be as low as 2kb (it can vary depending on average stack size - see golang/go#18138). This is an alternative to #1301, @cdvr and me were talking about this, and he inspired this idea with the closure. By using a function and a closure we're able to reduce the size and remove the degradation. At least on my laptop, this change result in a new performance gain, as all benchmarks show reduced time. 10 runs. ``` ❯ benchstat ~/before.txt ~/after-after.txt goos: darwin goarch: arm64 pkg: go.uber.org/zap │ /Users/pawel/before.txt │ /Users/pawel/after-after.txt │ │ sec/op │ sec/op vs base │ AnyInGoroutine/any-12 305.2n ± 3% 297.0n ± 0% ~ (p=0.085 n=10) AnyInGoroutine/int-12 288.0n ± 0% 270.1n ± 1% -6.25% (p=0.000 n=10) AnyInGoroutine/goroutine-12 218.3n ± 6% 209.5n ± 5% -4.05% (p=0.015 n=10) AnyInGoroutine/int-in-go-12 592.7n ± 2% 573.9n ± 2% -3.17% (p=0.000 n=10) AnyInGoroutine/any-in-go-12 666.5n ± 4% 552.3n ± 1% -17.13% (p=0.000 n=10) AnyInGoroutine/int-in-go-with-stack-12 474.4n ± 4% 459.4n ± 6% ~ (p=0.447 n=10+9) AnyInGoroutine/any-in-go-with-stack-12 617.6n ± 4% 468.8n ± 4% -24.09% (p=0.000 n=10) geomean 417.8n 380.1n -9.01% │ /Users/pawel/before.txt │ /Users/pawel/after-after.txt │ │ B/op │ B/op vs base │ AnyInGoroutine/any-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/goroutine-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) AnyInGoroutine/any-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) AnyInGoroutine/int-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10+9) ¹ AnyInGoroutine/any-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean │ /Users/pawel/before.txt │ /Users/pawel/after-after.txt │ │ allocs/op │ allocs/op vs base │ AnyInGoroutine/any-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/goroutine-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/any-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10+9) ¹ AnyInGoroutine/any-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean ```
We have identified an issue where zap.Any can cause performance degradation due to stack size. This is apparently cased by the compiler assigning 4.8kb (a zap.Field per arm of the switch statement) for zap.Any on stack. This can result in an unnecessary runtime.newstack+runtime.copystack. A github issue against Go language is pending. This can be particularly bad if `zap.Any` was to be used in a new goroutine, since the default goroutine sizes can be as low as 2kb (it can vary depending on average stack size - see golang/go#18138). This is an alternative to #1301, @cdvr and me were talking about this, and he inspired this idea with the closure. By using a function and a closure we're able to reduce the size and remove the degradation. At least on my laptop, this change result in a new performance gain, as all benchmarks show reduced time. 10 runs. ``` ❯ benchstat ~/before.txt ~/after-after.txt goos: darwin goarch: arm64 pkg: go.uber.org/zap │ /Users/pawel/before.txt │ /Users/pawel/after-after.txt │ │ sec/op │ sec/op vs base │ AnyInGoroutine/any-12 305.2n ± 3% 297.0n ± 0% ~ (p=0.085 n=10) AnyInGoroutine/int-12 288.0n ± 0% 270.1n ± 1% -6.25% (p=0.000 n=10) AnyInGoroutine/goroutine-12 218.3n ± 6% 209.5n ± 5% -4.05% (p=0.015 n=10) AnyInGoroutine/int-in-go-12 592.7n ± 2% 573.9n ± 2% -3.17% (p=0.000 n=10) AnyInGoroutine/any-in-go-12 666.5n ± 4% 552.3n ± 1% -17.13% (p=0.000 n=10) AnyInGoroutine/int-in-go-with-stack-12 474.4n ± 4% 459.4n ± 6% ~ (p=0.447 n=10+9) AnyInGoroutine/any-in-go-with-stack-12 617.6n ± 4% 468.8n ± 4% -24.09% (p=0.000 n=10) geomean 417.8n 380.1n -9.01% │ /Users/pawel/before.txt │ /Users/pawel/after-after.txt │ │ B/op │ B/op vs base │ AnyInGoroutine/any-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/goroutine-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) AnyInGoroutine/any-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) AnyInGoroutine/int-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10+9) ¹ AnyInGoroutine/any-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean │ /Users/pawel/before.txt │ /Users/pawel/after-after.txt │ │ allocs/op │ allocs/op vs base │ AnyInGoroutine/any-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/goroutine-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/any-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10+9) ¹ AnyInGoroutine/any-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean ```
We have identified an issue where zap.Any can cause performance degradation due to stack size. This is apparently cased by the compiler assigning 4.8kb (a zap.Field per arm of the switch statement) for zap.Any on stack. This can result in an unnecessary runtime.newstack+runtime.copystack. A github issue against Go language is pending. This can be particularly bad if `zap.Any` was to be used in a new goroutine, since the default goroutine sizes can be as low as 2kb (it can vary depending on average stack size - see golang/go#18138). This is an alternative to #1301, @cdvr and me were talking about this, and he inspired this idea with the closure. By using a function and a closure we're able to reduce the size and remove the degradation. At least on my laptop, this change result in a new performance gain, as all benchmarks show reduced time. 10 runs. ``` ❯ benchstat ~/before2.txt ~/after2.txt goos: darwin goarch: arm64 pkg: go.uber.org/zap │ /Users/pawel/before2.txt │ /Users/pawel/after2.txt │ │ sec/op │ sec/op vs base │ Any/str-no-logger-12 3.344n ± 1% 3.029n ± 1% -9.40% (p=0.000 n=10) Any/any-no-logger-12 13.80n ± 4% 18.67n ± 1% +35.29% (p=0.000 n=10) Any/str-with-logger-12 372.4n ± 3% 363.6n ± 1% -2.35% (p=0.001 n=10) Any/any-with-logger-12 369.2n ± 1% 363.6n ± 1% -1.52% (p=0.002 n=10) Any/str-in-go-12 587.2n ± 2% 587.0n ± 1% ~ (p=0.617 n=10) Any/any-in-go-12 666.5n ± 3% 567.6n ± 1% -14.85% (p=0.000 n=10) Any/str-in-go-with-stack-12 448.6n ± 18% 403.4n ± 13% ~ (p=0.280 n=10) Any/any-in-go-with-stack-12 564.9n ± 7% 443.2n ± 4% -21.55% (p=0.000 n=10) geomean 167.8n 160.7n -4.23% │ /Users/pawel/before2.txt │ /Users/pawel/after2.txt │ │ B/op │ B/op vs base │ Any/str-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-with-logger-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ Any/any-with-logger-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean │ /Users/pawel/before2.txt │ /Users/pawel/after2.txt │ │ allocs/op │ allocs/op vs base │ Any/str-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-with-logger-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-with-logger-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ```
We have identified an issue where zap.Any can cause performance degradation due to stack size. This is apparently cased by the compiler assigning 4.8kb (a zap.Field per arm of the switch statement) for zap.Any on stack. This can result in an unnecessary runtime.newstack+runtime.copystack. A github issue against Go language is pending. This can be particularly bad if `zap.Any` was to be used in a new goroutine, since the default goroutine sizes can be as low as 2kb (it can vary depending on average stack size - see golang/go#18138). This is an alternative to #1301, @cdvr and me were talking about this, and he inspired this idea with the closure. By using a function and a closure we're able to reduce the size and remove the degradation. At least on my laptop, this change result in a new performance gain, as all benchmarks show reduced time. 10 runs. ``` ❯ benchstat ~/before2.txt ~/after2.txt goos: darwin goarch: arm64 pkg: go.uber.org/zap │ /Users/pawel/before2.txt │ /Users/pawel/after2.txt │ │ sec/op │ sec/op vs base │ Any/str-no-logger-12 3.344n ± 1% 3.029n ± 1% -9.40% (p=0.000 n=10) Any/any-no-logger-12 13.80n ± 4% 18.67n ± 1% +35.29% (p=0.000 n=10) Any/str-with-logger-12 372.4n ± 3% 363.6n ± 1% -2.35% (p=0.001 n=10) Any/any-with-logger-12 369.2n ± 1% 363.6n ± 1% -1.52% (p=0.002 n=10) Any/str-in-go-12 587.2n ± 2% 587.0n ± 1% ~ (p=0.617 n=10) Any/any-in-go-12 666.5n ± 3% 567.6n ± 1% -14.85% (p=0.000 n=10) Any/str-in-go-with-stack-12 448.6n ± 18% 403.4n ± 13% ~ (p=0.280 n=10) Any/any-in-go-with-stack-12 564.9n ± 7% 443.2n ± 4% -21.55% (p=0.000 n=10) geomean 167.8n 160.7n -4.23% │ /Users/pawel/before2.txt │ /Users/pawel/after2.txt │ │ B/op │ B/op vs base │ Any/str-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-with-logger-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ Any/any-with-logger-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean │ /Users/pawel/before2.txt │ /Users/pawel/after2.txt │ │ allocs/op │ allocs/op vs base │ Any/str-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-with-logger-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-with-logger-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ``` On absolute terms: Before: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3154850 387.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3239221 371.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 3273285 363.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3251991 372.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 2944020 401.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 2984863 368.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3265248 363.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3301592 365.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 764239 1423 ns/op 140 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1510189 753.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 3013986 369.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2128927 540.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 464083 2551 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 818104 1347 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1587925 698.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2452558 466.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 767626 1440 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1534382 771.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 2384058 433.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 3146942 450.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 434194 2524 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 851312 1304 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1570944 710.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2546115 604.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 50.238s ``` After: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3191725 382.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3159882 367.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 2998960 373.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3264657 361.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 3168627 386.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 3169394 364.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3271981 368.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3293463 362.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 793905 1388 ns/op 143 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1724048 748.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 2536380 444.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2177941 586.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 890155 1237 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 1836302 719.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 3671503 322.2 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2257405 540.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 811408 1457 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1384990 729.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 3228151 381.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 2678596 450.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 821092 1386 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 1747638 662.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 3747934 341.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2678191 463.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 53.238s ```
This is an alternative to #1301 and #1302. It's not as fast as these two options, but it still gives us half the stack reduction without the `unsafe` usage. Interestingly it seems that on both arm64 and amd64 the new code, with the closure, is faster than the plain old switch. We do see a ~5-10ns delay on `Any` creation if it's used without `logger`, but that's minimal and not realistic. Bunch of credit for this goes to @cdvr1993, we started independently, I was about to give up but the conversations pushed me forward. In the end he ended up going into a more advanced land where I dare not to enter. Longer version: We have identified an issue where zap.Any can cause performance degradation due to stack size. This is apparently cased by the compiler assigning 4.8kb (a zap.Field per arm of the switch statement) for zap.Any on stack. This can result in an unnecessary runtime.newstack+runtime.copystack. A github issue against Go language is pending. This can be particularly bad if `zap.Any` was to be used in a new goroutine, since the default goroutine sizes can be as low as 2kb (it can vary depending on average stack size - see golang/go#18138). This is an alternative to #1301, @cdvr and me were talking about this, and he inspired this idea with the closure. By using a function and a closure we're able to reduce the size and remove the degradation. At least on my laptop, this change result in a new performance gain, as all benchmarks show reduced time. 10 runs. ``` ❯ benchstat ~/before2.txt ~/after2.txt goos: darwin goarch: arm64 pkg: go.uber.org/zap │ /Users/pawel/before2.txt │ /Users/pawel/after2.txt │ │ sec/op │ sec/op vs base │ Any/str-no-logger-12 3.344n ± 1% 3.029n ± 1% -9.40% (p=0.000 n=10) Any/any-no-logger-12 13.80n ± 4% 18.67n ± 1% +35.29% (p=0.000 n=10) Any/str-with-logger-12 372.4n ± 3% 363.6n ± 1% -2.35% (p=0.001 n=10) Any/any-with-logger-12 369.2n ± 1% 363.6n ± 1% -1.52% (p=0.002 n=10) Any/str-in-go-12 587.2n ± 2% 587.0n ± 1% ~ (p=0.617 n=10) Any/any-in-go-12 666.5n ± 3% 567.6n ± 1% -14.85% (p=0.000 n=10) Any/str-in-go-with-stack-12 448.6n ± 18% 403.4n ± 13% ~ (p=0.280 n=10) Any/any-in-go-with-stack-12 564.9n ± 7% 443.2n ± 4% -21.55% (p=0.000 n=10) geomean 167.8n 160.7n -4.23% │ /Users/pawel/before2.txt │ /Users/pawel/after2.txt │ │ B/op │ B/op vs base │ Any/str-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-with-logger-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ Any/any-with-logger-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean │ /Users/pawel/before2.txt │ /Users/pawel/after2.txt │ │ allocs/op │ allocs/op vs base │ Any/str-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-with-logger-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-with-logger-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ``` On absolute terms: Before, on arm64: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3154850 387.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3239221 371.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 3273285 363.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3251991 372.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 2944020 401.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 2984863 368.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3265248 363.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3301592 365.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 764239 1423 ns/op 140 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1510189 753.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 3013986 369.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2128927 540.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 464083 2551 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 818104 1347 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1587925 698.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2452558 466.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 767626 1440 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1534382 771.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 2384058 433.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 3146942 450.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 434194 2524 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 851312 1304 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1570944 710.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2546115 604.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 50.238s ``` After: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3191725 382.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3159882 367.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 2998960 373.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3264657 361.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 3168627 386.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 3169394 364.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3271981 368.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3293463 362.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 793905 1388 ns/op 143 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1724048 748.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 2536380 444.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2177941 586.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 890155 1237 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 1836302 719.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 3671503 322.2 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2257405 540.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 811408 1457 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1384990 729.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 3228151 381.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 2678596 450.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 821092 1386 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 1747638 662.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 3747934 341.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2678191 463.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 53.238s ``` And amd64, before: ``` % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.58 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.52 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.56 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.50 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 39399811 30.35 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 39448304 30.63 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 39647024 30.32 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 39479619 30.46 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1798702 669.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1862551 647.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1848636 642.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1878465 656.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1776140 684.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1868102 668.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1869589 639.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1782540 648.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 458112 2594 ns/op 91 B/op 2 allocs/op BenchmarkAny/str-in-go-2 820398 1344 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1392148 969.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1790403 644.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 220327 4897 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 494391 2701 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 823672 1399 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 1591206 746.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 384094 2820 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 809073 1530 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1464598 933.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 1943251 578.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 233019 4967 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 356689 2848 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 791342 1385 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 1662126 746.0 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 51.671s ``` After: ``` goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.65 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.64 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.63 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.65 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 27779637 44.20 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 27881986 42.96 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 27587953 43.39 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 26861058 43.43 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1749990 690.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1807341 660.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1821039 654.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1865083 650.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1677643 741.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1905400 689.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1843364 646.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1899883 645.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 453326 2479 ns/op 92 B/op 2 allocs/op BenchmarkAny/str-in-go-2 724555 1580 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1358790 953.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1805985 585.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 466447 2395 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 874053 1487 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1457768 834.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 1795317 632.5 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 407620 2749 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 725614 1597 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1303908 863.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 1957864 609.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 497640 2401 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 648355 1549 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1486416 869.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2116040 568.8 ns/op 88 B/op 2 allocs/op PASS ```
This is an alternative to #1301 and #1302. It's not as fast as these two options, but it still gives us half the stack reduction without the `unsafe` usage. Interestingly it seems that on both arm64 and amd64 the new code, with the closure, is faster than the plain old switch. We do see a ~5-10ns delay on `Any` creation if it's used without `logger`, but that's minimal and not realistic. Bunch of credit for this goes to @cdvr1993, we started independently, I was about to give up but the conversations pushed me forward. In the end he ended up going into a more advanced land where I dare not to enter. Longer version: We have identified an issue where zap.Any can cause performance degradation due to stack size. This is apparently cased by the compiler assigning 4.8kb (a zap.Field per arm of the switch statement) for zap.Any on stack. This can result in an unnecessary runtime.newstack+runtime.copystack. A github issue against Go language is pending. This can be particularly bad if `zap.Any` was to be used in a new goroutine, since the default goroutine sizes can be as low as 2kb (it can vary depending on average stack size - see golang/go#18138). This is an alternative to #1301, @cdvr and me were talking about this, and he inspired this idea with the closure. By using a function and a closure we're able to reduce the size and remove the degradation. At least on my laptop, this change result in a new performance gain, as all benchmarks show reduced time. 10 runs. ``` ❯ benchstat ~/before2.txt ~/after2.txt goos: darwin goarch: arm64 pkg: go.uber.org/zap │ /Users/pawel/before2.txt │ /Users/pawel/after2.txt │ │ sec/op │ sec/op vs base │ Any/str-no-logger-12 3.344n ± 1% 3.029n ± 1% -9.40% (p=0.000 n=10) Any/any-no-logger-12 13.80n ± 4% 18.67n ± 1% +35.29% (p=0.000 n=10) Any/str-with-logger-12 372.4n ± 3% 363.6n ± 1% -2.35% (p=0.001 n=10) Any/any-with-logger-12 369.2n ± 1% 363.6n ± 1% -1.52% (p=0.002 n=10) Any/str-in-go-12 587.2n ± 2% 587.0n ± 1% ~ (p=0.617 n=10) Any/any-in-go-12 666.5n ± 3% 567.6n ± 1% -14.85% (p=0.000 n=10) Any/str-in-go-with-stack-12 448.6n ± 18% 403.4n ± 13% ~ (p=0.280 n=10) Any/any-in-go-with-stack-12 564.9n ± 7% 443.2n ± 4% -21.55% (p=0.000 n=10) geomean 167.8n 160.7n -4.23% │ /Users/pawel/before2.txt │ /Users/pawel/after2.txt │ │ B/op │ B/op vs base │ Any/str-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-with-logger-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ Any/any-with-logger-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean │ /Users/pawel/before2.txt │ /Users/pawel/after2.txt │ │ allocs/op │ allocs/op vs base │ Any/str-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-with-logger-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-with-logger-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ``` On absolute terms: Before, on arm64: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3154850 387.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3239221 371.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 3273285 363.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3251991 372.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 2944020 401.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 2984863 368.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3265248 363.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3301592 365.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 764239 1423 ns/op 140 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1510189 753.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 3013986 369.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2128927 540.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 464083 2551 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 818104 1347 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1587925 698.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2452558 466.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 767626 1440 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1534382 771.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 2384058 433.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 3146942 450.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 434194 2524 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 851312 1304 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1570944 710.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2546115 604.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 50.238s ``` After: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3191725 382.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3159882 367.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 2998960 373.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3264657 361.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 3168627 386.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 3169394 364.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3271981 368.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3293463 362.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 793905 1388 ns/op 143 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1724048 748.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 2536380 444.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2177941 586.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 890155 1237 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 1836302 719.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 3671503 322.2 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2257405 540.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 811408 1457 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1384990 729.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 3228151 381.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 2678596 450.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 821092 1386 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 1747638 662.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 3747934 341.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2678191 463.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 53.238s ``` And amd64, before: ``` % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.58 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.52 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.56 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.50 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 39399811 30.35 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 39448304 30.63 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 39647024 30.32 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 39479619 30.46 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1798702 669.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1862551 647.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1848636 642.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1878465 656.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1776140 684.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1868102 668.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1869589 639.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1782540 648.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 458112 2594 ns/op 91 B/op 2 allocs/op BenchmarkAny/str-in-go-2 820398 1344 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1392148 969.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1790403 644.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 220327 4897 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 494391 2701 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 823672 1399 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 1591206 746.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 384094 2820 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 809073 1530 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1464598 933.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 1943251 578.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 233019 4967 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 356689 2848 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 791342 1385 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 1662126 746.0 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 51.671s ``` After: ``` goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.65 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.64 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.63 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.65 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 27779637 44.20 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 27881986 42.96 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 27587953 43.39 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 26861058 43.43 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1749990 690.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1807341 660.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1821039 654.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1865083 650.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1677643 741.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1905400 689.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1843364 646.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1899883 645.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 453326 2479 ns/op 92 B/op 2 allocs/op BenchmarkAny/str-in-go-2 724555 1580 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1358790 953.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1805985 585.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 466447 2395 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 874053 1487 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1457768 834.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 1795317 632.5 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 407620 2749 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 725614 1597 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1303908 863.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 1957864 609.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 497640 2401 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 648355 1549 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1486416 869.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2116040 568.8 ns/op 88 B/op 2 allocs/op PASS ```
This is an alternative to #1301 and #1302. It's not as fast as these two options, but it still gives us half the stack reduction without the `unsafe` usage. Interestingly it seems that on both arm64 and amd64 the new code, with the closure, is faster than the plain old switch. We do see a ~5-10ns delay on `Any` creation if it's used without `logger`, but that's minimal and not realistic. Bunch of credit for this goes to @cdvr1993, we started independently, I was about to give up but the conversations pushed me forward. In the end he ended up going into a more advanced land where I dare not to enter. Longer version: We have identified an issue where zap.Any can cause performance degradation due to stack size. This is apparently cased by the compiler assigning 4.8kb (a zap.Field per arm of the switch statement) for zap.Any on stack. This can result in an unnecessary runtime.newstack+runtime.copystack. A github issue against Go language is pending. This can be particularly bad if `zap.Any` was to be used in a new goroutine, since the default goroutine sizes can be as low as 2kb (it can vary depending on average stack size - see golang/go#18138). This is an alternative to #1301, @cdvr and me were talking about this, and he inspired this idea with the closure. By using a function and a closure we're able to reduce the size and remove the degradation. At least on my laptop, this change result in a new performance gain, as all benchmarks show reduced time. 10 runs. ``` ❯ benchstat ~/before2.txt ~/after2.txt goos: darwin goarch: arm64 pkg: go.uber.org/zap │ /Users/pawel/before2.txt │ /Users/pawel/after2.txt │ │ sec/op │ sec/op vs base │ Any/str-no-logger-12 3.344n ± 1% 3.029n ± 1% -9.40% (p=0.000 n=10) Any/any-no-logger-12 13.80n ± 4% 18.67n ± 1% +35.29% (p=0.000 n=10) Any/str-with-logger-12 372.4n ± 3% 363.6n ± 1% -2.35% (p=0.001 n=10) Any/any-with-logger-12 369.2n ± 1% 363.6n ± 1% -1.52% (p=0.002 n=10) Any/str-in-go-12 587.2n ± 2% 587.0n ± 1% ~ (p=0.617 n=10) Any/any-in-go-12 666.5n ± 3% 567.6n ± 1% -14.85% (p=0.000 n=10) Any/str-in-go-with-stack-12 448.6n ± 18% 403.4n ± 13% ~ (p=0.280 n=10) Any/any-in-go-with-stack-12 564.9n ± 7% 443.2n ± 4% -21.55% (p=0.000 n=10) geomean 167.8n 160.7n -4.23% │ /Users/pawel/before2.txt │ /Users/pawel/after2.txt │ │ B/op │ B/op vs base │ Any/str-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-with-logger-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ Any/any-with-logger-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean │ /Users/pawel/before2.txt │ /Users/pawel/after2.txt │ │ allocs/op │ allocs/op vs base │ Any/str-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-with-logger-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-with-logger-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ``` On absolute terms: Before, on arm64: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3154850 387.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3239221 371.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 3273285 363.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3251991 372.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 2944020 401.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 2984863 368.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3265248 363.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3301592 365.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 764239 1423 ns/op 140 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1510189 753.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 3013986 369.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2128927 540.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 464083 2551 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 818104 1347 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1587925 698.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2452558 466.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 767626 1440 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1534382 771.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 2384058 433.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 3146942 450.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 434194 2524 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 851312 1304 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1570944 710.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2546115 604.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 50.238s ``` After: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3191725 382.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3159882 367.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 2998960 373.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3264657 361.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 3168627 386.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 3169394 364.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3271981 368.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3293463 362.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 793905 1388 ns/op 143 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1724048 748.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 2536380 444.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2177941 586.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 890155 1237 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 1836302 719.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 3671503 322.2 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2257405 540.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 811408 1457 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1384990 729.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 3228151 381.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 2678596 450.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 821092 1386 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 1747638 662.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 3747934 341.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2678191 463.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 53.238s ``` And amd64, before: ``` % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.58 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.52 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.56 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.50 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 39399811 30.35 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 39448304 30.63 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 39647024 30.32 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 39479619 30.46 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1798702 669.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1862551 647.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1848636 642.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1878465 656.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1776140 684.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1868102 668.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1869589 639.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1782540 648.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 458112 2594 ns/op 91 B/op 2 allocs/op BenchmarkAny/str-in-go-2 820398 1344 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1392148 969.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1790403 644.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 220327 4897 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 494391 2701 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 823672 1399 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 1591206 746.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 384094 2820 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 809073 1530 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1464598 933.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 1943251 578.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 233019 4967 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 356689 2848 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 791342 1385 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 1662126 746.0 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 51.671s ``` After: ``` goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.65 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.64 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.63 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.65 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 27779637 44.20 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 27881986 42.96 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 27587953 43.39 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 26861058 43.43 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1749990 690.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1807341 660.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1821039 654.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1865083 650.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1677643 741.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1905400 689.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1843364 646.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1899883 645.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 453326 2479 ns/op 92 B/op 2 allocs/op BenchmarkAny/str-in-go-2 724555 1580 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1358790 953.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1805985 585.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 466447 2395 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 874053 1487 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1457768 834.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 1795317 632.5 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 407620 2749 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 725614 1597 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1303908 863.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 1957864 609.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 497640 2401 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 648355 1549 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1486416 869.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2116040 568.8 ns/op 88 B/op 2 allocs/op PASS ```
This is an alternative to: - #1301 and #1302 and #1304 - a series of PRs that are faster than this one. However, they rely on unsafe. - #1303 - my own PR that uses closures, to reduce the stack size by 60%. This PR reduces the stack size from: ``` field.go:420 0xd16c3 4881ecf8120000 SUBQ $0x12f8, SP // 4856 ``` to ``` field.go:420 0xcb603 4881ecb8000000 SUBQ $0xb8, SP // 184 ``` so by ~96%. More crucially, `zap.Any` is now as fast as correctly typed methods, like `zap.String`, etc. The downside is the (slight) incrase in the code maitenance - we unroll as much as we can and rely on the compiler correctly re-using small variable sizes. While this is not pretty, it feels safe - the changes were purely mechanical. Future changes and extensions should be easy to review. Additionally, the new code is (slightly) faster in all cases since we remove 1-2 function calls from all paths. The "in new goroutine" path is most affected, as shown in benchmarks below. This was largely inspired by conversations with @cdvr1993. We started looking at this in parallel, but I would have given up if it wasn't for our conversations. This particular version was inspired by an earlier version of #1304 - where I realized that @cdvr1993 is doing a similar dispatching mechanism that zap is already doing via `zapcore` - a possible optimization. Longer version: We have identified an issue where zap.Any can cause performance degradation due to stack size. This is apparently cased by the compiler assigning 4.8kb (a zap.Field per arm of the switch statement) for zap.Any on stack. This can result in an unnecessary runtime.newstack+runtime.copystack. A github issue against Go language is pending. This can be particularly bad if `zap.Any` was to be used in a new goroutine, since the default goroutine sizes can be as low as 2kb (it can vary depending on average stack size - see golang/go#18138). *Most crucially, `zap.Any` is now as fast as a direct dispatch like `zap.String`.* 10 runs. ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap │ before-final.txt │ after-final.txt │ │ sec/op │ sec/op vs base │ Any/str-no-logger 3.106n ± 2% 3.160n ± 1% +1.75% (p=0.025 n=10) Any/str-no-logger-2 3.171n ± 4% 3.142n ± 1% ~ (p=0.593 n=10) Any/str-no-logger-4 3.108n ± 3% 3.139n ± 2% +0.97% (p=0.004 n=10) Any/str-no-logger-8 3.099n ± 2% 3.143n ± 2% ~ (p=0.086 n=10) Any/any-no-logger 13.89n ± 2% 12.98n ± 2% -6.59% (p=0.000 n=10) Any/any-no-logger-2 13.97n ± 2% 12.96n ± 2% -7.27% (p=0.000 n=10) Any/any-no-logger-4 13.83n ± 2% 12.89n ± 2% -6.83% (p=0.000 n=10) Any/any-no-logger-8 13.77n ± 2% 12.88n ± 2% -6.43% (p=0.000 n=10) Any/str-with-logger 384.1n ± 2% 383.9n ± 6% ~ (p=0.810 n=10) Any/str-with-logger-2 367.8n ± 2% 368.5n ± 3% ~ (p=0.971 n=10) Any/str-with-logger-4 372.4n ± 2% 368.6n ± 4% ~ (p=0.912 n=10) Any/str-with-logger-8 369.8n ± 3% 368.3n ± 3% ~ (p=0.698 n=10) Any/any-with-logger 383.8n ± 3% 383.3n ± 6% ~ (p=0.838 n=10) Any/any-with-logger-2 370.0n ± 3% 367.6n ± 1% ~ (p=0.239 n=10) Any/any-with-logger-4 370.0n ± 3% 368.2n ± 4% ~ (p=0.631 n=10) Any/any-with-logger-8 367.6n ± 2% 369.7n ± 3% ~ (p=0.756 n=10) Any/str-in-go 1.334µ ± 3% 1.347µ ± 3% ~ (p=0.271 n=10) Any/str-in-go-2 754.5n ± 3% 744.8n ± 5% ~ (p=0.481 n=10) Any/str-in-go-4 420.2n ± 11% 367.7n ± 31% ~ (p=0.086 n=10) Any/str-in-go-8 557.6n ± 4% 547.1n ± 12% ~ (p=0.579 n=10) Any/any-in-go 2.562µ ± 4% 1.447µ ± 3% -43.53% (p=0.000 n=10) Any/any-in-go-2 1361.0n ± 4% 761.4n ± 7% -44.06% (p=0.000 n=10) Any/any-in-go-4 732.1n ± 9% 397.1n ± 11% -45.76% (p=0.000 n=10) Any/any-in-go-8 541.3n ± 13% 564.6n ± 5% +4.30% (p=0.041 n=10) Any/str-in-go-with-stack 1.420µ ± 1% 1.428µ ± 3% ~ (p=0.670 n=10) Any/str-in-go-with-stack-2 749.5n ± 4% 771.8n ± 4% ~ (p=0.123 n=10) Any/str-in-go-with-stack-4 433.2n ± 15% 400.7n ± 14% ~ (p=0.393 n=10) Any/str-in-go-with-stack-8 494.0n ± 7% 490.1n ± 10% ~ (p=0.853 n=10) Any/any-in-go-with-stack 2.586µ ± 3% 1.471µ ± 4% -43.14% (p=0.000 n=10) Any/any-in-go-with-stack-2 1343.0n ± 3% 773.7n ± 4% -42.39% (p=0.000 n=10) Any/any-in-go-with-stack-4 697.7n ± 8% 403.4n ± 9% -42.17% (p=0.000 n=10) Any/any-in-go-with-stack-8 490.8n ± 9% 492.8n ± 8% ~ (p=0.796 n=10) geomean 206.3n 182.9n -11.35% ``` On absolute terms: Before, on arm64: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3154850 387.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3239221 371.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 3273285 363.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3251991 372.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 2944020 401.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 2984863 368.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3265248 363.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3301592 365.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 764239 1423 ns/op 140 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1510189 753.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 3013986 369.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2128927 540.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 464083 2551 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 818104 1347 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1587925 698.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2452558 466.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 767626 1440 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1534382 771.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 2384058 433.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 3146942 450.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 434194 2524 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 851312 1304 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1570944 710.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2546115 604.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 50.238s ``` After: ``` ❯ go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3202051 382.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3301683 371.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 3186028 364.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3061030 371.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 3203704 378.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 3281462 372.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3252879 371.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3246148 373.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 804132 1404 ns/op 133 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1686093 758.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 3075596 430.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2101650 543.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 845822 1424 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 1531311 736.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 2618665 464.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2130280 536.2 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 818583 1440 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1533379 739.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 2507131 399.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 2348804 453.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 807199 1526 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 1590476 783.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 3026263 383.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2615467 493.8 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 51.077s ``` And amd64, before: ``` % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.58 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.52 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.56 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.50 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 39399811 30.35 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 39448304 30.63 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 39647024 30.32 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 39479619 30.46 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1798702 669.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1862551 647.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1848636 642.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1878465 656.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1776140 684.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1868102 668.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1869589 639.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1782540 648.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 458112 2594 ns/op 91 B/op 2 allocs/op BenchmarkAny/str-in-go-2 820398 1344 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1392148 969.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1790403 644.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 220327 4897 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 494391 2701 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 823672 1399 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 1591206 746.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 384094 2820 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 809073 1530 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1464598 933.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 1943251 578.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 233019 4967 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 356689 2848 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 791342 1385 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 1662126 746.0 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 51.671s ``` After: ``` % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.77 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.75 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.76 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.69 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 49795383 24.33 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 48821454 24.31 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 49452686 24.79 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 49359926 24.26 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1808188 700.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1894179 643.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1858263 649.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1879894 645.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1817276 663.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1906438 637.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1837354 641.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1909658 648.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 468484 2463 ns/op 96 B/op 2 allocs/op BenchmarkAny/str-in-go-2 726475 1465 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1285284 958.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1746547 573.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 426568 2715 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 611106 1703 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1000000 1017 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2220459 625.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 429721 2673 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 637306 1593 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1301713 902.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 2012583 651.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 391810 2833 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 675589 1639 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1219318 970.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 1825632 574.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 50.294s ```
This is an alternative to: - #1301 and #1302 and #1304 - a series of PRs that are faster than this one. However, they rely on unsafe. - #1303 - my own PR that uses closures, to reduce the stack size by 60%. This PR reduces the stack size from: ``` field.go:420 0xd16c3 4881ecf8120000 SUBQ $0x12f8, SP // 4856 ``` to ``` field.go:420 0xcb603 4881ecb8000000 SUBQ $0xb8, SP // 184 ``` so by ~96%. More crucially, `zap.Any` is now as fast as correctly typed methods, like `zap.String`, etc. The downside is the (slight) incrase in the code maitenance - we unroll as much as we can and rely on the compiler correctly re-using small variable sizes. While this is not pretty, it feels safe - the changes were purely mechanical. Future changes and extensions should be easy to review. Additionally, the new code is (slightly) faster in all cases since we remove 1-2 function calls from all paths. The "in new goroutine" path is most affected, as shown in benchmarks below. This was largely inspired by conversations with @cdvr1993. We started looking at this in parallel, but I would have given up if it wasn't for our conversations. This particular version was inspired by an earlier version of #1304 - where I realized that @cdvr1993 is doing a similar dispatching mechanism that zap is already doing via `zapcore` - a possible optimization. Longer version: We have identified an issue where zap.Any can cause performance degradation due to stack size. This is apparently cased by the compiler assigning 4.8kb (a zap.Field per arm of the switch statement) for zap.Any on stack. This can result in an unnecessary runtime.newstack+runtime.copystack. A github issue against Go language is pending. This can be particularly bad if `zap.Any` was to be used in a new goroutine, since the default goroutine sizes can be as low as 2kb (it can vary depending on average stack size - see golang/go#18138). *Most crucially, `zap.Any` is now as fast as a direct dispatch like `zap.String`.* 10 runs. ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap │ before-final.txt │ after-final.txt │ │ sec/op │ sec/op vs base │ Any/str-no-logger 3.106n ± 2% 3.160n ± 1% +1.75% (p=0.025 n=10) Any/str-no-logger-2 3.171n ± 4% 3.142n ± 1% ~ (p=0.593 n=10) Any/str-no-logger-4 3.108n ± 3% 3.139n ± 2% +0.97% (p=0.004 n=10) Any/str-no-logger-8 3.099n ± 2% 3.143n ± 2% ~ (p=0.086 n=10) Any/any-no-logger 13.89n ± 2% 12.98n ± 2% -6.59% (p=0.000 n=10) Any/any-no-logger-2 13.97n ± 2% 12.96n ± 2% -7.27% (p=0.000 n=10) Any/any-no-logger-4 13.83n ± 2% 12.89n ± 2% -6.83% (p=0.000 n=10) Any/any-no-logger-8 13.77n ± 2% 12.88n ± 2% -6.43% (p=0.000 n=10) Any/str-with-logger 384.1n ± 2% 383.9n ± 6% ~ (p=0.810 n=10) Any/str-with-logger-2 367.8n ± 2% 368.5n ± 3% ~ (p=0.971 n=10) Any/str-with-logger-4 372.4n ± 2% 368.6n ± 4% ~ (p=0.912 n=10) Any/str-with-logger-8 369.8n ± 3% 368.3n ± 3% ~ (p=0.698 n=10) Any/any-with-logger 383.8n ± 3% 383.3n ± 6% ~ (p=0.838 n=10) Any/any-with-logger-2 370.0n ± 3% 367.6n ± 1% ~ (p=0.239 n=10) Any/any-with-logger-4 370.0n ± 3% 368.2n ± 4% ~ (p=0.631 n=10) Any/any-with-logger-8 367.6n ± 2% 369.7n ± 3% ~ (p=0.756 n=10) Any/str-in-go 1.334µ ± 3% 1.347µ ± 3% ~ (p=0.271 n=10) Any/str-in-go-2 754.5n ± 3% 744.8n ± 5% ~ (p=0.481 n=10) Any/str-in-go-4 420.2n ± 11% 367.7n ± 31% ~ (p=0.086 n=10) Any/str-in-go-8 557.6n ± 4% 547.1n ± 12% ~ (p=0.579 n=10) Any/any-in-go 2.562µ ± 4% 1.447µ ± 3% -43.53% (p=0.000 n=10) Any/any-in-go-2 1361.0n ± 4% 761.4n ± 7% -44.06% (p=0.000 n=10) Any/any-in-go-4 732.1n ± 9% 397.1n ± 11% -45.76% (p=0.000 n=10) Any/any-in-go-8 541.3n ± 13% 564.6n ± 5% +4.30% (p=0.041 n=10) Any/str-in-go-with-stack 1.420µ ± 1% 1.428µ ± 3% ~ (p=0.670 n=10) Any/str-in-go-with-stack-2 749.5n ± 4% 771.8n ± 4% ~ (p=0.123 n=10) Any/str-in-go-with-stack-4 433.2n ± 15% 400.7n ± 14% ~ (p=0.393 n=10) Any/str-in-go-with-stack-8 494.0n ± 7% 490.1n ± 10% ~ (p=0.853 n=10) Any/any-in-go-with-stack 2.586µ ± 3% 1.471µ ± 4% -43.14% (p=0.000 n=10) Any/any-in-go-with-stack-2 1343.0n ± 3% 773.7n ± 4% -42.39% (p=0.000 n=10) Any/any-in-go-with-stack-4 697.7n ± 8% 403.4n ± 9% -42.17% (p=0.000 n=10) Any/any-in-go-with-stack-8 490.8n ± 9% 492.8n ± 8% ~ (p=0.796 n=10) geomean 206.3n 182.9n -11.35% ``` On absolute terms: Before, on arm64: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3154850 387.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3239221 371.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 3273285 363.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3251991 372.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 2944020 401.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 2984863 368.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3265248 363.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3301592 365.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 764239 1423 ns/op 140 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1510189 753.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 3013986 369.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2128927 540.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 464083 2551 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 818104 1347 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1587925 698.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2452558 466.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 767626 1440 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1534382 771.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 2384058 433.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 3146942 450.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 434194 2524 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 851312 1304 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1570944 710.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2546115 604.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 50.238s ``` After: ``` ❯ go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3202051 382.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3301683 371.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 3186028 364.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3061030 371.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 3203704 378.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 3281462 372.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3252879 371.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3246148 373.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 804132 1404 ns/op 133 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1686093 758.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 3075596 430.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2101650 543.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 845822 1424 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 1531311 736.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 2618665 464.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2130280 536.2 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 818583 1440 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1533379 739.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 2507131 399.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 2348804 453.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 807199 1526 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 1590476 783.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 3026263 383.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2615467 493.8 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 51.077s ``` And amd64, before: ``` % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.58 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.52 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.56 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.50 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 39399811 30.35 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 39448304 30.63 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 39647024 30.32 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 39479619 30.46 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1798702 669.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1862551 647.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1848636 642.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1878465 656.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1776140 684.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1868102 668.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1869589 639.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1782540 648.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 458112 2594 ns/op 91 B/op 2 allocs/op BenchmarkAny/str-in-go-2 820398 1344 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1392148 969.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1790403 644.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 220327 4897 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 494391 2701 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 823672 1399 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 1591206 746.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 384094 2820 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 809073 1530 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1464598 933.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 1943251 578.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 233019 4967 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 356689 2848 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 791342 1385 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 1662126 746.0 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 51.671s ``` After: ``` % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.77 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.75 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.76 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.69 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 49795383 24.33 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 48821454 24.31 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 49452686 24.79 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 49359926 24.26 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1808188 700.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1894179 643.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1858263 649.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1879894 645.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1817276 663.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1906438 637.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1837354 641.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1909658 648.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 468484 2463 ns/op 96 B/op 2 allocs/op BenchmarkAny/str-in-go-2 726475 1465 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1285284 958.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1746547 573.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 426568 2715 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 611106 1703 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1000000 1017 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2220459 625.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 429721 2673 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 637306 1593 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1301713 902.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 2012583 651.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 391810 2833 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 675589 1639 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1219318 970.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 1825632 574.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 50.294s ```
This is an alternative to: - #1301 and #1302 and #1304 - a series of PRs that are faster than this one. However, they rely on unsafe. - #1303 - my own PR that uses closures, to reduce the stack size by 60%. This PR reduces the stack size from: ``` field.go:420 0xd16c3 4881ecf8120000 SUBQ $0x12f8, SP // 4856 ``` to ``` field.go:420 0xcb603 4881ecb8000000 SUBQ $0xb8, SP // 184 ``` so by ~96%. More crucially, `zap.Any` is now as fast as correctly typed methods, like `zap.String`, etc. The downside is the (slight) incrase in the code maitenance - we unroll as much as we can and rely on the compiler correctly re-using small variable sizes. While this is not pretty, it feels safe - the changes were purely mechanical. Future changes and extensions should be easy to review. Additionally, the new code is (slightly) faster in all cases since we remove 1-2 function calls from all paths. The "in new goroutine" path is most affected, as shown in benchmarks below. This was largely inspired by conversations with @cdvr1993. We started looking at this in parallel, but I would have given up if it wasn't for our conversations. This particular version was inspired by an earlier version of #1304 - where I realized that @cdvr1993 is doing a similar dispatching mechanism that zap is already doing via `zapcore` - a possible optimization. Longer version: We have identified an issue where zap.Any can cause performance degradation due to stack size. This is apparently cased by the compiler assigning 4.8kb (a zap.Field per arm of the switch statement) for zap.Any on stack. This can result in an unnecessary runtime.newstack+runtime.copystack. A github issue against Go language is pending. This can be particularly bad if `zap.Any` was to be used in a new goroutine, since the default goroutine sizes can be as low as 2kb (it can vary depending on average stack size - see golang/go#18138). *Most crucially, `zap.Any` is now as fast as a direct dispatch like `zap.String`.* 10 runs. ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap │ before-final.txt │ after-final.txt │ │ sec/op │ sec/op vs base │ Any/str-no-logger 3.106n ± 2% 3.160n ± 1% +1.75% (p=0.025 n=10) Any/str-no-logger-2 3.171n ± 4% 3.142n ± 1% ~ (p=0.593 n=10) Any/str-no-logger-4 3.108n ± 3% 3.139n ± 2% +0.97% (p=0.004 n=10) Any/str-no-logger-8 3.099n ± 2% 3.143n ± 2% ~ (p=0.086 n=10) Any/any-no-logger 13.89n ± 2% 12.98n ± 2% -6.59% (p=0.000 n=10) Any/any-no-logger-2 13.97n ± 2% 12.96n ± 2% -7.27% (p=0.000 n=10) Any/any-no-logger-4 13.83n ± 2% 12.89n ± 2% -6.83% (p=0.000 n=10) Any/any-no-logger-8 13.77n ± 2% 12.88n ± 2% -6.43% (p=0.000 n=10) Any/str-with-logger 384.1n ± 2% 383.9n ± 6% ~ (p=0.810 n=10) Any/str-with-logger-2 367.8n ± 2% 368.5n ± 3% ~ (p=0.971 n=10) Any/str-with-logger-4 372.4n ± 2% 368.6n ± 4% ~ (p=0.912 n=10) Any/str-with-logger-8 369.8n ± 3% 368.3n ± 3% ~ (p=0.698 n=10) Any/any-with-logger 383.8n ± 3% 383.3n ± 6% ~ (p=0.838 n=10) Any/any-with-logger-2 370.0n ± 3% 367.6n ± 1% ~ (p=0.239 n=10) Any/any-with-logger-4 370.0n ± 3% 368.2n ± 4% ~ (p=0.631 n=10) Any/any-with-logger-8 367.6n ± 2% 369.7n ± 3% ~ (p=0.756 n=10) Any/str-in-go 1.334µ ± 3% 1.347µ ± 3% ~ (p=0.271 n=10) Any/str-in-go-2 754.5n ± 3% 744.8n ± 5% ~ (p=0.481 n=10) Any/str-in-go-4 420.2n ± 11% 367.7n ± 31% ~ (p=0.086 n=10) Any/str-in-go-8 557.6n ± 4% 547.1n ± 12% ~ (p=0.579 n=10) Any/any-in-go 2.562µ ± 4% 1.447µ ± 3% -43.53% (p=0.000 n=10) Any/any-in-go-2 1361.0n ± 4% 761.4n ± 7% -44.06% (p=0.000 n=10) Any/any-in-go-4 732.1n ± 9% 397.1n ± 11% -45.76% (p=0.000 n=10) Any/any-in-go-8 541.3n ± 13% 564.6n ± 5% +4.30% (p=0.041 n=10) Any/str-in-go-with-stack 1.420µ ± 1% 1.428µ ± 3% ~ (p=0.670 n=10) Any/str-in-go-with-stack-2 749.5n ± 4% 771.8n ± 4% ~ (p=0.123 n=10) Any/str-in-go-with-stack-4 433.2n ± 15% 400.7n ± 14% ~ (p=0.393 n=10) Any/str-in-go-with-stack-8 494.0n ± 7% 490.1n ± 10% ~ (p=0.853 n=10) Any/any-in-go-with-stack 2.586µ ± 3% 1.471µ ± 4% -43.14% (p=0.000 n=10) Any/any-in-go-with-stack-2 1343.0n ± 3% 773.7n ± 4% -42.39% (p=0.000 n=10) Any/any-in-go-with-stack-4 697.7n ± 8% 403.4n ± 9% -42.17% (p=0.000 n=10) Any/any-in-go-with-stack-8 490.8n ± 9% 492.8n ± 8% ~ (p=0.796 n=10) geomean 206.3n 182.9n -11.35% ``` On absolute terms: Before, on arm64: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3154850 387.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3239221 371.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 3273285 363.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3251991 372.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 2944020 401.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 2984863 368.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3265248 363.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3301592 365.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 764239 1423 ns/op 140 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1510189 753.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 3013986 369.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2128927 540.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 464083 2551 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 818104 1347 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1587925 698.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2452558 466.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 767626 1440 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1534382 771.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 2384058 433.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 3146942 450.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 434194 2524 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 851312 1304 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1570944 710.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2546115 604.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 50.238s ``` After: ``` ❯ go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3202051 382.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3301683 371.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 3186028 364.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3061030 371.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 3203704 378.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 3281462 372.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3252879 371.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3246148 373.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 804132 1404 ns/op 133 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1686093 758.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 3075596 430.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2101650 543.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 845822 1424 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 1531311 736.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 2618665 464.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2130280 536.2 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 818583 1440 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1533379 739.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 2507131 399.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 2348804 453.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 807199 1526 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 1590476 783.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 3026263 383.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2615467 493.8 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 51.077s ``` And amd64, before: ``` % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.58 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.52 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.56 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.50 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 39399811 30.35 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 39448304 30.63 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 39647024 30.32 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 39479619 30.46 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1798702 669.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1862551 647.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1848636 642.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1878465 656.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1776140 684.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1868102 668.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1869589 639.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1782540 648.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 458112 2594 ns/op 91 B/op 2 allocs/op BenchmarkAny/str-in-go-2 820398 1344 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1392148 969.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1790403 644.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 220327 4897 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 494391 2701 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 823672 1399 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 1591206 746.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 384094 2820 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 809073 1530 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1464598 933.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 1943251 578.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 233019 4967 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 356689 2848 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 791342 1385 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 1662126 746.0 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 51.671s ``` After: ``` % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.77 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.75 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.76 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.69 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 49795383 24.33 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 48821454 24.31 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 49452686 24.79 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 49359926 24.26 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1808188 700.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1894179 643.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1858263 649.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1879894 645.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1817276 663.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1906438 637.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1837354 641.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1909658 648.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 468484 2463 ns/op 96 B/op 2 allocs/op BenchmarkAny/str-in-go-2 726475 1465 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1285284 958.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1746547 573.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 426568 2715 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 611106 1703 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1000000 1017 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2220459 625.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 429721 2673 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 637306 1593 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1301713 902.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 2012583 651.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 391810 2833 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 675589 1639 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1219318 970.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 1825632 574.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 50.294s ```
This is an alternative to: - #1301 and #1302 and #1304 - a series of PRs that are faster than this one. However, they rely on unsafe. - #1303 - my own PR that uses closures, to reduce the stack size by 60%. This PR reduces the stack size from: ``` field.go:420 0xd16c3 4881ecf8120000 SUBQ $0x12f8, SP // 4856 ``` to ``` field.go:420 0xcb603 4881ecb8000000 SUBQ $0xb8, SP // 184 ``` so by ~96%. More crucially, `zap.Any` is now as fast as correctly typed methods, like `zap.String`, etc. The downside is the (slight) incrase in the code maitenance - we unroll as much as we can and rely on the compiler correctly re-using small variable sizes. While this is not pretty, it feels safe - the changes were purely mechanical. Future changes and extensions should be easy to review. Additionally, the new code is (slightly) faster in all cases since we remove 1-2 function calls from all paths. The "in new goroutine" path is most affected, as shown in benchmarks below. This was largely inspired by conversations with @cdvr1993. We started looking at this in parallel, but I would have given up if it wasn't for our conversations. This particular version was inspired by an earlier version of #1304 - where I realized that @cdvr1993 is doing a similar dispatching mechanism that zap is already doing via `zapcore` - a possible optimization. Longer version: We have identified an issue where zap.Any can cause performance degradation due to stack size. This is apparently cased by the compiler assigning 4.8kb (a zap.Field per arm of the switch statement) for zap.Any on stack. This can result in an unnecessary runtime.newstack+runtime.copystack. A github issue against Go language is pending. This can be particularly bad if `zap.Any` was to be used in a new goroutine, since the default goroutine sizes can be as low as 2kb (it can vary depending on average stack size - see golang/go#18138). *Most crucially, `zap.Any` is now as fast as a direct dispatch like `zap.String`.* 10 runs. ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap │ before-final.txt │ after-final.txt │ │ sec/op │ sec/op vs base │ Any/str-no-logger 3.106n ± 2% 3.160n ± 1% +1.75% (p=0.025 n=10) Any/str-no-logger-2 3.171n ± 4% 3.142n ± 1% ~ (p=0.593 n=10) Any/str-no-logger-4 3.108n ± 3% 3.139n ± 2% +0.97% (p=0.004 n=10) Any/str-no-logger-8 3.099n ± 2% 3.143n ± 2% ~ (p=0.086 n=10) Any/any-no-logger 13.89n ± 2% 12.98n ± 2% -6.59% (p=0.000 n=10) Any/any-no-logger-2 13.97n ± 2% 12.96n ± 2% -7.27% (p=0.000 n=10) Any/any-no-logger-4 13.83n ± 2% 12.89n ± 2% -6.83% (p=0.000 n=10) Any/any-no-logger-8 13.77n ± 2% 12.88n ± 2% -6.43% (p=0.000 n=10) Any/str-with-logger 384.1n ± 2% 383.9n ± 6% ~ (p=0.810 n=10) Any/str-with-logger-2 367.8n ± 2% 368.5n ± 3% ~ (p=0.971 n=10) Any/str-with-logger-4 372.4n ± 2% 368.6n ± 4% ~ (p=0.912 n=10) Any/str-with-logger-8 369.8n ± 3% 368.3n ± 3% ~ (p=0.698 n=10) Any/any-with-logger 383.8n ± 3% 383.3n ± 6% ~ (p=0.838 n=10) Any/any-with-logger-2 370.0n ± 3% 367.6n ± 1% ~ (p=0.239 n=10) Any/any-with-logger-4 370.0n ± 3% 368.2n ± 4% ~ (p=0.631 n=10) Any/any-with-logger-8 367.6n ± 2% 369.7n ± 3% ~ (p=0.756 n=10) Any/str-in-go 1.334µ ± 3% 1.347µ ± 3% ~ (p=0.271 n=10) Any/str-in-go-2 754.5n ± 3% 744.8n ± 5% ~ (p=0.481 n=10) Any/str-in-go-4 420.2n ± 11% 367.7n ± 31% ~ (p=0.086 n=10) Any/str-in-go-8 557.6n ± 4% 547.1n ± 12% ~ (p=0.579 n=10) Any/any-in-go 2.562µ ± 4% 1.447µ ± 3% -43.53% (p=0.000 n=10) Any/any-in-go-2 1361.0n ± 4% 761.4n ± 7% -44.06% (p=0.000 n=10) Any/any-in-go-4 732.1n ± 9% 397.1n ± 11% -45.76% (p=0.000 n=10) Any/any-in-go-8 541.3n ± 13% 564.6n ± 5% +4.30% (p=0.041 n=10) Any/str-in-go-with-stack 1.420µ ± 1% 1.428µ ± 3% ~ (p=0.670 n=10) Any/str-in-go-with-stack-2 749.5n ± 4% 771.8n ± 4% ~ (p=0.123 n=10) Any/str-in-go-with-stack-4 433.2n ± 15% 400.7n ± 14% ~ (p=0.393 n=10) Any/str-in-go-with-stack-8 494.0n ± 7% 490.1n ± 10% ~ (p=0.853 n=10) Any/any-in-go-with-stack 2.586µ ± 3% 1.471µ ± 4% -43.14% (p=0.000 n=10) Any/any-in-go-with-stack-2 1343.0n ± 3% 773.7n ± 4% -42.39% (p=0.000 n=10) Any/any-in-go-with-stack-4 697.7n ± 8% 403.4n ± 9% -42.17% (p=0.000 n=10) Any/any-in-go-with-stack-8 490.8n ± 9% 492.8n ± 8% ~ (p=0.796 n=10) geomean 206.3n 182.9n -11.35% ``` On absolute terms: Before, on arm64: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3154850 387.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3239221 371.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 3273285 363.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3251991 372.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 2944020 401.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 2984863 368.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3265248 363.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3301592 365.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 764239 1423 ns/op 140 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1510189 753.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 3013986 369.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2128927 540.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 464083 2551 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 818104 1347 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1587925 698.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2452558 466.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 767626 1440 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1534382 771.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 2384058 433.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 3146942 450.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 434194 2524 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 851312 1304 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1570944 710.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2546115 604.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 50.238s ``` After: ``` ❯ go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3202051 382.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3301683 371.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 3186028 364.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3061030 371.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 3203704 378.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 3281462 372.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3252879 371.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3246148 373.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 804132 1404 ns/op 133 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1686093 758.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 3075596 430.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2101650 543.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 845822 1424 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 1531311 736.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 2618665 464.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2130280 536.2 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 818583 1440 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1533379 739.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 2507131 399.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 2348804 453.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 807199 1526 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 1590476 783.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 3026263 383.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2615467 493.8 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 51.077s ``` And amd64, before: ``` % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.58 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.52 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.56 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.50 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 39399811 30.35 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 39448304 30.63 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 39647024 30.32 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 39479619 30.46 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1798702 669.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1862551 647.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1848636 642.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1878465 656.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1776140 684.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1868102 668.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1869589 639.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1782540 648.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 458112 2594 ns/op 91 B/op 2 allocs/op BenchmarkAny/str-in-go-2 820398 1344 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1392148 969.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1790403 644.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 220327 4897 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 494391 2701 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 823672 1399 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 1591206 746.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 384094 2820 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 809073 1530 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1464598 933.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 1943251 578.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 233019 4967 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 356689 2848 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 791342 1385 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 1662126 746.0 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 51.671s ``` After: ``` % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.77 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.75 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.76 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.69 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 49795383 24.33 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 48821454 24.31 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 49452686 24.79 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 49359926 24.26 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1808188 700.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1894179 643.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1858263 649.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1879894 645.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1817276 663.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1906438 637.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1837354 641.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1909658 648.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 468484 2463 ns/op 96 B/op 2 allocs/op BenchmarkAny/str-in-go-2 726475 1465 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1285284 958.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1746547 573.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 426568 2715 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 611106 1703 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1000000 1017 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2220459 625.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 429721 2673 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 637306 1593 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1301713 902.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 2012583 651.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 391810 2833 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 675589 1639 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1219318 970.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 1825632 574.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 50.294s ```
This is an alternative to: - #1301 and #1302 and #1304 - a series of PRs that are faster than this one. However, they rely on unsafe. - #1303 - my own PR that uses closures, to reduce the stack size by 60%. This PR reduces the stack size from: ``` field.go:420 0xd16c3 4881ecf8120000 SUBQ $0x12f8, SP // 4856 ``` to ``` field.go:420 0xcb603 4881ecb8000000 SUBQ $0xb8, SP // 184 ``` so by ~96%. More crucially, `zap.Any` is now as fast as correctly typed methods, like `zap.String`, etc. The downside is the (slight) incrase in the code maitenance - we unroll as much as we can and rely on the compiler correctly re-using small variable sizes. While this is not pretty, it feels safe - the changes were purely mechanical. Future changes and extensions should be easy to review. Additionally, the new code is (slightly) faster in all cases since we remove 1-2 function calls from all paths. The "in new goroutine" path is most affected, as shown in benchmarks below. This was largely inspired by conversations with @cdvr1993. We started looking at this in parallel, but I would have given up if it wasn't for our conversations. This particular version was inspired by an earlier version of #1304 - where I realized that @cdvr1993 is doing a similar dispatching mechanism that zap is already doing via `zapcore` - a possible optimization. Longer version: We have identified an issue where zap.Any can cause performance degradation due to stack size. This is apparently cased by the compiler assigning 4.8kb (a zap.Field per arm of the switch statement) for zap.Any on stack. This can result in an unnecessary runtime.newstack+runtime.copystack. A github issue against Go language is pending. This can be particularly bad if `zap.Any` was to be used in a new goroutine, since the default goroutine sizes can be as low as 2kb (it can vary depending on average stack size - see golang/go#18138). *Most crucially, `zap.Any` is now as fast as a direct dispatch like `zap.String`.* 10 runs. ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap │ before-final.txt │ after-final.txt │ │ sec/op │ sec/op vs base │ Any/str-no-logger 3.106n ± 2% 3.160n ± 1% +1.75% (p=0.025 n=10) Any/str-no-logger-2 3.171n ± 4% 3.142n ± 1% ~ (p=0.593 n=10) Any/str-no-logger-4 3.108n ± 3% 3.139n ± 2% +0.97% (p=0.004 n=10) Any/str-no-logger-8 3.099n ± 2% 3.143n ± 2% ~ (p=0.086 n=10) Any/any-no-logger 13.89n ± 2% 12.98n ± 2% -6.59% (p=0.000 n=10) Any/any-no-logger-2 13.97n ± 2% 12.96n ± 2% -7.27% (p=0.000 n=10) Any/any-no-logger-4 13.83n ± 2% 12.89n ± 2% -6.83% (p=0.000 n=10) Any/any-no-logger-8 13.77n ± 2% 12.88n ± 2% -6.43% (p=0.000 n=10) Any/str-with-logger 384.1n ± 2% 383.9n ± 6% ~ (p=0.810 n=10) Any/str-with-logger-2 367.8n ± 2% 368.5n ± 3% ~ (p=0.971 n=10) Any/str-with-logger-4 372.4n ± 2% 368.6n ± 4% ~ (p=0.912 n=10) Any/str-with-logger-8 369.8n ± 3% 368.3n ± 3% ~ (p=0.698 n=10) Any/any-with-logger 383.8n ± 3% 383.3n ± 6% ~ (p=0.838 n=10) Any/any-with-logger-2 370.0n ± 3% 367.6n ± 1% ~ (p=0.239 n=10) Any/any-with-logger-4 370.0n ± 3% 368.2n ± 4% ~ (p=0.631 n=10) Any/any-with-logger-8 367.6n ± 2% 369.7n ± 3% ~ (p=0.756 n=10) Any/str-in-go 1.334µ ± 3% 1.347µ ± 3% ~ (p=0.271 n=10) Any/str-in-go-2 754.5n ± 3% 744.8n ± 5% ~ (p=0.481 n=10) Any/str-in-go-4 420.2n ± 11% 367.7n ± 31% ~ (p=0.086 n=10) Any/str-in-go-8 557.6n ± 4% 547.1n ± 12% ~ (p=0.579 n=10) Any/any-in-go 2.562µ ± 4% 1.447µ ± 3% -43.53% (p=0.000 n=10) Any/any-in-go-2 1361.0n ± 4% 761.4n ± 7% -44.06% (p=0.000 n=10) Any/any-in-go-4 732.1n ± 9% 397.1n ± 11% -45.76% (p=0.000 n=10) Any/any-in-go-8 541.3n ± 13% 564.6n ± 5% +4.30% (p=0.041 n=10) Any/str-in-go-with-stack 1.420µ ± 1% 1.428µ ± 3% ~ (p=0.670 n=10) Any/str-in-go-with-stack-2 749.5n ± 4% 771.8n ± 4% ~ (p=0.123 n=10) Any/str-in-go-with-stack-4 433.2n ± 15% 400.7n ± 14% ~ (p=0.393 n=10) Any/str-in-go-with-stack-8 494.0n ± 7% 490.1n ± 10% ~ (p=0.853 n=10) Any/any-in-go-with-stack 2.586µ ± 3% 1.471µ ± 4% -43.14% (p=0.000 n=10) Any/any-in-go-with-stack-2 1343.0n ± 3% 773.7n ± 4% -42.39% (p=0.000 n=10) Any/any-in-go-with-stack-4 697.7n ± 8% 403.4n ± 9% -42.17% (p=0.000 n=10) Any/any-in-go-with-stack-8 490.8n ± 9% 492.8n ± 8% ~ (p=0.796 n=10) geomean 206.3n 182.9n -11.35% ``` On absolute terms: Before, on arm64: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3154850 387.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3239221 371.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 3273285 363.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3251991 372.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 2944020 401.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 2984863 368.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3265248 363.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3301592 365.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 764239 1423 ns/op 140 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1510189 753.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 3013986 369.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2128927 540.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 464083 2551 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 818104 1347 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1587925 698.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2452558 466.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 767626 1440 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1534382 771.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 2384058 433.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 3146942 450.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 434194 2524 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 851312 1304 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1570944 710.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2546115 604.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 50.238s ``` After: ``` ❯ go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3202051 382.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3301683 371.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 3186028 364.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3061030 371.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 3203704 378.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 3281462 372.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3252879 371.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3246148 373.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 804132 1404 ns/op 133 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1686093 758.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 3075596 430.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2101650 543.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 845822 1424 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 1531311 736.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 2618665 464.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2130280 536.2 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 818583 1440 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1533379 739.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 2507131 399.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 2348804 453.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 807199 1526 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 1590476 783.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 3026263 383.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2615467 493.8 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 51.077s ``` And amd64, before: ``` % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.58 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.52 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.56 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.50 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 39399811 30.35 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 39448304 30.63 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 39647024 30.32 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 39479619 30.46 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1798702 669.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1862551 647.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1848636 642.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1878465 656.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1776140 684.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1868102 668.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1869589 639.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1782540 648.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 458112 2594 ns/op 91 B/op 2 allocs/op BenchmarkAny/str-in-go-2 820398 1344 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1392148 969.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1790403 644.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 220327 4897 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 494391 2701 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 823672 1399 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 1591206 746.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 384094 2820 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 809073 1530 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1464598 933.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 1943251 578.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 233019 4967 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 356689 2848 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 791342 1385 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 1662126 746.0 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 51.671s ``` After: ``` % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.77 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.75 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.76 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.69 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 49795383 24.33 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 48821454 24.31 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 49452686 24.79 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 49359926 24.26 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1808188 700.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1894179 643.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1858263 649.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1879894 645.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1817276 663.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1906438 637.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1837354 641.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1909658 648.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 468484 2463 ns/op 96 B/op 2 allocs/op BenchmarkAny/str-in-go-2 726475 1465 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1285284 958.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1746547 573.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 426568 2715 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 611106 1703 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1000000 1017 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2220459 625.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 429721 2673 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 637306 1593 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1301713 902.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 2012583 651.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 391810 2833 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 675589 1639 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1219318 970.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 1825632 574.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 50.294s ```
This is an alternative to #1301 and #1302. It's not as fast as these two options, but it still gives us half the stack reduction without the `unsafe` usage. Interestingly it seems that on both arm64 and amd64 the new code, with the closure, is faster than the plain old switch. We do see a ~5-10ns delay on `Any` creation if it's used without `logger`, but that's minimal and not realistic. Bunch of credit for this goes to @cdvr1993, we started independently, I was about to give up but the conversations pushed me forward. In the end he ended up going into a more advanced land where I dare not to enter. Longer version: We have identified an issue where zap.Any can cause performance degradation due to stack size. This is apparently cased by the compiler assigning 4.8kb (a zap.Field per arm of the switch statement) for zap.Any on stack. This can result in an unnecessary runtime.newstack+runtime.copystack. A github issue against Go language is pending. This can be particularly bad if `zap.Any` was to be used in a new goroutine, since the default goroutine sizes can be as low as 2kb (it can vary depending on average stack size - see golang/go#18138). This is an alternative to #1301, @cdvr and me were talking about this, and he inspired this idea with the closure. By using a function and a closure we're able to reduce the size and remove the degradation. At least on my laptop, this change result in a new performance gain, as all benchmarks show reduced time. 10 runs. ``` ❯ benchstat ~/before2.txt ~/after2.txt goos: darwin goarch: arm64 pkg: go.uber.org/zap │ /Users/pawel/before2.txt │ /Users/pawel/after2.txt │ │ sec/op │ sec/op vs base │ Any/str-no-logger-12 3.344n ± 1% 3.029n ± 1% -9.40% (p=0.000 n=10) Any/any-no-logger-12 13.80n ± 4% 18.67n ± 1% +35.29% (p=0.000 n=10) Any/str-with-logger-12 372.4n ± 3% 363.6n ± 1% -2.35% (p=0.001 n=10) Any/any-with-logger-12 369.2n ± 1% 363.6n ± 1% -1.52% (p=0.002 n=10) Any/str-in-go-12 587.2n ± 2% 587.0n ± 1% ~ (p=0.617 n=10) Any/any-in-go-12 666.5n ± 3% 567.6n ± 1% -14.85% (p=0.000 n=10) Any/str-in-go-with-stack-12 448.6n ± 18% 403.4n ± 13% ~ (p=0.280 n=10) Any/any-in-go-with-stack-12 564.9n ± 7% 443.2n ± 4% -21.55% (p=0.000 n=10) geomean 167.8n 160.7n -4.23% │ /Users/pawel/before2.txt │ /Users/pawel/after2.txt │ │ B/op │ B/op vs base │ Any/str-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-with-logger-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ Any/any-with-logger-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean │ /Users/pawel/before2.txt │ /Users/pawel/after2.txt │ │ allocs/op │ allocs/op vs base │ Any/str-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-with-logger-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-with-logger-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ``` On absolute terms: Before, on arm64: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3154850 387.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3239221 371.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 3273285 363.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3251991 372.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 2944020 401.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 2984863 368.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3265248 363.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3301592 365.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 764239 1423 ns/op 140 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1510189 753.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 3013986 369.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2128927 540.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 464083 2551 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 818104 1347 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1587925 698.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2452558 466.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 767626 1440 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1534382 771.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 2384058 433.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 3146942 450.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 434194 2524 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 851312 1304 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1570944 710.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2546115 604.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 50.238s ``` After: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3191725 382.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3159882 367.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 2998960 373.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3264657 361.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 3168627 386.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 3169394 364.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3271981 368.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3293463 362.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 793905 1388 ns/op 143 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1724048 748.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 2536380 444.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2177941 586.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 890155 1237 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 1836302 719.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 3671503 322.2 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2257405 540.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 811408 1457 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1384990 729.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 3228151 381.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 2678596 450.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 821092 1386 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 1747638 662.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 3747934 341.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2678191 463.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 53.238s ``` And amd64, before: ``` % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.58 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.52 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.56 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.50 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 39399811 30.35 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 39448304 30.63 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 39647024 30.32 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 39479619 30.46 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1798702 669.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1862551 647.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1848636 642.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1878465 656.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1776140 684.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1868102 668.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1869589 639.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1782540 648.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 458112 2594 ns/op 91 B/op 2 allocs/op BenchmarkAny/str-in-go-2 820398 1344 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1392148 969.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1790403 644.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 220327 4897 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 494391 2701 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 823672 1399 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 1591206 746.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 384094 2820 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 809073 1530 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1464598 933.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 1943251 578.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 233019 4967 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 356689 2848 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 791342 1385 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 1662126 746.0 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 51.671s ``` After: ``` goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.65 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.64 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.63 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.65 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 27779637 44.20 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 27881986 42.96 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 27587953 43.39 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 26861058 43.43 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1749990 690.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1807341 660.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1821039 654.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1865083 650.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1677643 741.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1905400 689.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1843364 646.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1899883 645.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 453326 2479 ns/op 92 B/op 2 allocs/op BenchmarkAny/str-in-go-2 724555 1580 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1358790 953.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1805985 585.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 466447 2395 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 874053 1487 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1457768 834.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 1795317 632.5 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 407620 2749 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 725614 1597 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1303908 863.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 1957864 609.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 497640 2401 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 648355 1549 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1486416 869.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2116040 568.8 ns/op 88 B/op 2 allocs/op PASS ```
This is an alternative to: - #1301 and #1302 and #1304 - a series of PRs that are faster than this one. However, they rely on unsafe. - #1303 - my own PR that uses closures, to reduce the stack size by 60%. This PR reduces the stack size from: ``` field.go:420 0xd16c3 4881ecf8120000 SUBQ $0x12f8, SP // 4856 ``` to ``` field.go:420 0xcb603 4881ecb8000000 SUBQ $0xb8, SP // 184 ``` so by ~96%. More crucially, `zap.Any` is now as fast as correctly typed methods, like `zap.String`, etc. The downside is the (slight) incrase in the code maitenance - we unroll as much as we can and rely on the compiler correctly re-using small variable sizes. While this is not pretty, it feels safe - the changes were purely mechanical. Future changes and extensions should be easy to review. Additionally, the new code is (slightly) faster in all cases since we remove 1-2 function calls from all paths. The "in new goroutine" path is most affected, as shown in benchmarks below. This was largely inspired by conversations with @cdvr1993. We started looking at this in parallel, but I would have given up if it wasn't for our conversations. This particular version was inspired by an earlier version of #1304 - where I realized that @cdvr1993 is doing a similar dispatching mechanism that zap is already doing via `zapcore` - a possible optimization. Longer version: We have identified an issue where zap.Any can cause performance degradation due to stack size. This is apparently cased by the compiler assigning 4.8kb (a zap.Field per arm of the switch statement) for zap.Any on stack. This can result in an unnecessary runtime.newstack+runtime.copystack. A github issue against Go language is pending. This can be particularly bad if `zap.Any` was to be used in a new goroutine, since the default goroutine sizes can be as low as 2kb (it can vary depending on average stack size - see golang/go#18138). *Most crucially, `zap.Any` is now as fast as a direct dispatch like `zap.String`.* 10 runs. ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap │ before-final.txt │ after-final.txt │ │ sec/op │ sec/op vs base │ Any/str-no-logger 3.106n ± 2% 3.160n ± 1% +1.75% (p=0.025 n=10) Any/str-no-logger-2 3.171n ± 4% 3.142n ± 1% ~ (p=0.593 n=10) Any/str-no-logger-4 3.108n ± 3% 3.139n ± 2% +0.97% (p=0.004 n=10) Any/str-no-logger-8 3.099n ± 2% 3.143n ± 2% ~ (p=0.086 n=10) Any/any-no-logger 13.89n ± 2% 12.98n ± 2% -6.59% (p=0.000 n=10) Any/any-no-logger-2 13.97n ± 2% 12.96n ± 2% -7.27% (p=0.000 n=10) Any/any-no-logger-4 13.83n ± 2% 12.89n ± 2% -6.83% (p=0.000 n=10) Any/any-no-logger-8 13.77n ± 2% 12.88n ± 2% -6.43% (p=0.000 n=10) Any/str-with-logger 384.1n ± 2% 383.9n ± 6% ~ (p=0.810 n=10) Any/str-with-logger-2 367.8n ± 2% 368.5n ± 3% ~ (p=0.971 n=10) Any/str-with-logger-4 372.4n ± 2% 368.6n ± 4% ~ (p=0.912 n=10) Any/str-with-logger-8 369.8n ± 3% 368.3n ± 3% ~ (p=0.698 n=10) Any/any-with-logger 383.8n ± 3% 383.3n ± 6% ~ (p=0.838 n=10) Any/any-with-logger-2 370.0n ± 3% 367.6n ± 1% ~ (p=0.239 n=10) Any/any-with-logger-4 370.0n ± 3% 368.2n ± 4% ~ (p=0.631 n=10) Any/any-with-logger-8 367.6n ± 2% 369.7n ± 3% ~ (p=0.756 n=10) Any/str-in-go 1.334µ ± 3% 1.347µ ± 3% ~ (p=0.271 n=10) Any/str-in-go-2 754.5n ± 3% 744.8n ± 5% ~ (p=0.481 n=10) Any/str-in-go-4 420.2n ± 11% 367.7n ± 31% ~ (p=0.086 n=10) Any/str-in-go-8 557.6n ± 4% 547.1n ± 12% ~ (p=0.579 n=10) Any/any-in-go 2.562µ ± 4% 1.447µ ± 3% -43.53% (p=0.000 n=10) Any/any-in-go-2 1361.0n ± 4% 761.4n ± 7% -44.06% (p=0.000 n=10) Any/any-in-go-4 732.1n ± 9% 397.1n ± 11% -45.76% (p=0.000 n=10) Any/any-in-go-8 541.3n ± 13% 564.6n ± 5% +4.30% (p=0.041 n=10) Any/str-in-go-with-stack 1.420µ ± 1% 1.428µ ± 3% ~ (p=0.670 n=10) Any/str-in-go-with-stack-2 749.5n ± 4% 771.8n ± 4% ~ (p=0.123 n=10) Any/str-in-go-with-stack-4 433.2n ± 15% 400.7n ± 14% ~ (p=0.393 n=10) Any/str-in-go-with-stack-8 494.0n ± 7% 490.1n ± 10% ~ (p=0.853 n=10) Any/any-in-go-with-stack 2.586µ ± 3% 1.471µ ± 4% -43.14% (p=0.000 n=10) Any/any-in-go-with-stack-2 1343.0n ± 3% 773.7n ± 4% -42.39% (p=0.000 n=10) Any/any-in-go-with-stack-4 697.7n ± 8% 403.4n ± 9% -42.17% (p=0.000 n=10) Any/any-in-go-with-stack-8 490.8n ± 9% 492.8n ± 8% ~ (p=0.796 n=10) geomean 206.3n 182.9n -11.35% ``` On absolute terms: Before, on arm64: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3154850 387.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3239221 371.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 3273285 363.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3251991 372.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 2944020 401.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 2984863 368.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3265248 363.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3301592 365.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 764239 1423 ns/op 140 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1510189 753.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 3013986 369.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2128927 540.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 464083 2551 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 818104 1347 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1587925 698.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2452558 466.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 767626 1440 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1534382 771.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 2384058 433.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 3146942 450.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 434194 2524 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 851312 1304 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1570944 710.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2546115 604.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 50.238s ``` After: ``` ❯ go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3202051 382.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3301683 371.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 3186028 364.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3061030 371.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 3203704 378.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 3281462 372.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3252879 371.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3246148 373.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 804132 1404 ns/op 133 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1686093 758.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 3075596 430.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2101650 543.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 845822 1424 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 1531311 736.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 2618665 464.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2130280 536.2 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 818583 1440 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1533379 739.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 2507131 399.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 2348804 453.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 807199 1526 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 1590476 783.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 3026263 383.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2615467 493.8 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 51.077s ``` And amd64, before: ``` % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.58 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.52 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.56 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.50 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 39399811 30.35 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 39448304 30.63 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 39647024 30.32 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 39479619 30.46 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1798702 669.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1862551 647.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1848636 642.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1878465 656.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1776140 684.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1868102 668.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1869589 639.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1782540 648.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 458112 2594 ns/op 91 B/op 2 allocs/op BenchmarkAny/str-in-go-2 820398 1344 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1392148 969.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1790403 644.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 220327 4897 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 494391 2701 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 823672 1399 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 1591206 746.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 384094 2820 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 809073 1530 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1464598 933.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 1943251 578.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 233019 4967 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 356689 2848 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 791342 1385 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 1662126 746.0 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 51.671s ``` After: ``` % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.77 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.75 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.76 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.69 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 49795383 24.33 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 48821454 24.31 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 49452686 24.79 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 49359926 24.26 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1808188 700.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1894179 643.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1858263 649.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1879894 645.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1817276 663.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1906438 637.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1837354 641.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1909658 648.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 468484 2463 ns/op 96 B/op 2 allocs/op BenchmarkAny/str-in-go-2 726475 1465 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1285284 958.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1746547 573.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 426568 2715 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 611106 1703 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1000000 1017 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2220459 625.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 429721 2673 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 637306 1593 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1301713 902.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 2012583 651.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 391810 2833 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 675589 1639 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1219318 970.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 1825632 574.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 50.294s ```
It would be useful if someone still having problems with the stack size estimator could post a benchmark that demonstrates the problem. (Not that we don't believe you, it's just hard to iterate on solutions without something to measure!) Stack sizing is definitely something we've considered for PGO, but also it seems like something where, anything we could do with PGO, we could probably do on the fly. This would have a slower start, but could also be more adaptive. |
@aclements sorry - I don't have an issue with the stack size estimator anymore, but that's just because I moved from Couchbase to Ditto, and I am now battling with Rust instead ;-) |
A commonly proposed refinement to the existing average stack size adjustment is to track average stack size based on goroutine start PCs or perhaps one or more frames at the point the goroutine is created. One place where I worry this won't work well are things like gRPC servers. This servers launch one goroutine per request, and some of these requests may require lots of stack space ( As far as I can tell [1], gRPC starts this per-request goroutine before determining which request it is handling. This means that no matter the amount of context the runtime uses prior to starting the goroutine, it won't be able to deterministically differentiate between the different types of requests. This is rather unfortunate, though perhaps if we did this projects like gRPC would be motivated to refactor and start unique goroutine start PCs for each request type. [1] I've love to be wrong here, but I believe https://github.com/grpc/grpc-go/blob/f1a1fcd042af436be0eb3ac08ed81e2e29fcde4f/server.go#L1024 is the point the goroutine is started. |
The first time a stack is resized the runtime would likely have sufficient stack context to know what to resize it to. When I posted this issue originally what I noticed is that it was the growth from 2KB -> 4KB -> 8KB -> 16 KB -> 32KB that was slow. The |
What version of Go are you using (
go version
)?go version devel +41908a5 Thu Dec 1 02:54:21 2016 +0000 darwin/amd64
a.k.ago1.8beta1
What operating system and processor architecture are you using (
go env
)?What did you do?
A recent change to github.com/cockroachdb/cockroach replaced a synchronous call with one wrapped in a goroutine. This small change resulted in a significant slowdown in some benchmarks. The slowdown was traced to additional time being spent in
runtime.morestack
. The problematic goroutines are all hitting a single gRPC entrypointServer.Batch
and the code paths that fan out from this entrypoint tend to use an excessive amount of stack due to an over reliance on passing and returning by value instead of using pointers. Typical calls use 16-32 KB of stack.The expensive part of
runtime.morestack
is the adjustment of existing pointers on the stack. And due to the incremental nature of the stack growth, I can see the stack growing in 4 steps from 2 KB to 32 KB. So we experimented with a hack to pre-grow the stack. Voila, the performance penalty of the change disappeared:old
are benchmarks gathered using go1.8beta1 andnew
are on go1.8beta1 with the hack to pre-grow the stack. The hack is a call at the beginning ofserver.Batch
to agrowStack
method:The question here is whether this is copacetic and also to alert the runtime folks that there is a performance opportunity here. Note that the
growStackGlobal
is not currently necessary, but I wanted to future proof against the compiler deciding thatbuf
is not necessary.Longer term, the stack usage under
server.Batch
should be reduced on our side. I'm guessing that we could get the stack usage down to 8-16 KB without too many contortions. But even with such reductions, being able to pre-grow the stack for a goroutine looks beneficial.The text was updated successfully, but these errors were encountered: