-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
encoding/base64: encoding is slow #20206
Comments
The following patch https://go-review.googlesource.com/42410 optimises the base64 encoder to use SSSE3 instructions on amd64. This new SSSE3 encoder is 3 times faster than the existing Go encoder on a Mac Book Pro with an Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz running Sierra.
It is however, slightly slower, ~5%, on amd64 when used to encode data less than 12 bytes in size. This is mainly due to the extra function call, encodeAccelerated, which is not inlined on amd64. If this is not acceptable, I could completely replace Encoding.Encode on amd64, but doing so would involve some code duplication. I'm also working on an AVX2 version of the encoder which I will submit once the SSSE3 patch has been reviewed. |
If you're talking about the assembly function not being inlineable, I think that's expected now. But not sure if can be fixed in the future. You could always try a hybrid that uses SIMD for sizes >=N, and the existing code for the rest. An Since we're in the freeze though, this would be for 1.10. |
CL https://golang.org/cl/42410 mentions this issue. |
I guess the main argument for optimizing the current base64 package would be that it appears to be quite widely used1 and yet is 3-4 times slower on modern hardware than it needs to be. The problem is though, to get this 3-4x speedup on a single core, we would need to use SIMD optimized algorithms. I don't currently see a way to take advantage of the tricks used by these algorithms in pure Go, hence the assembly language patch. I appreciate assembly language comes with a maintenance burden, as noted in the comments on the patch in gerrit, and that this probably makes the patch un-mergeable. It may still make sense to keep the bug open, however, given the relative poor performance of the current encoder. Note, there is a similar issue (#19636) that references the same SIMD algorithms for the base64 decoder. 1 At the time of writing, Godoc reports 11773 imports of encoding/base64. To put this in context there are 1131 importers of aes, 3065 of encoding/gob and 56012 of encoding/json. |
Based on Google-Wide Profiling at least, base64 decode seems much hotter than encoding. For base64 standard library code:
Assuming this is representative of usage as a whole (which seems somewhat fair, considering this is sampling all Go applications inside Google), effort would be best spent optimizing decode. We can definitely keep this tracking bug open, though. It just might not deserve assembly. |
I have some SSE code for the decoding that give a nice perf boost (~3GB/s) without using big lookup tables but I stopped working on that code after my patch for SSE bytes counting was accepted as I didn't want to add yet more assembly code to go. |
While in CPU cache, you should be able to encode and decode base64 at something like 0.25 CPU cycles per byte or around 10 GB/s on a recent Intel/AMD processor. It should be so fast that you should literally have a hard time measuring the running time. It is only a few times slower than a memory copy. The following paper might be relevant... Faster Base64 Encoding and Decoding Using AVX2 Instructions:
c.c. @WojciechMula |
I think there's still quite a bit to be optimized in pure Go. For example, I just found a 6% performance win with a single line to get rid of a nil check in the encoding hot loop :) I'll send that CL shortly. |
Change https://golang.org/cl/151158 mentions this issue: |
Most of the encoding time is spent in the first Encode loop, since the rest of the function only deals with the few remaining bytes. Any unnecessary work done in that loop body matters tremendously. One such unnecessary bottleneck was the use of the enc.encode table. Since enc is a pointer receiver, and the field is first used within the loop, the encoder must perform a nil check at every iteration. Add a dummy use of the field before the start of the loop, to move the nil check there. After that line, the compiler now knows that enc can't be nil, and thus the hot loop is free of nil checks. name old time/op new time/op delta EncodeToString-4 14.7µs ± 0% 13.7µs ± 1% -6.53% (p=0.000 n=10+10) name old speed new speed delta EncodeToString-4 559MB/s ± 0% 598MB/s ± 1% +6.99% (p=0.000 n=10+10) Updates #20206. Change-Id: Icbb523a7bd9e470a8be0a448d1d78ade97ed4ff6 Reviewed-on: https://go-review.googlesource.com/c/151158 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
go version
)?go 1.8.1
What operating system and processor architecture are you using (
go env
)?amd64, darwin
What did you do?
go test encoding/base64 --bench=BenchmarkEncodeToString
What did you expect to see?
It should be possible to increase the speed of the encoder by 3 or 4 times by using SIMD instructions. So ideally, I'd like to see something like
go test --bench=BenchmarkEncodeToString
BenchmarkEncodeToString-8 500000 3777 ns/op 2168.51 MB/s
PASS
What did you see instead?
BenchmarkEncodeToString-8 200000 11393 ns/op 719.01 MB/s
PASS
ok encoding/base64 2.406s
The text was updated successfully, but these errors were encountered: