-
-
Notifications
You must be signed in to change notification settings - Fork 31.5k
src: enable avx512 optimization for base64 encoding #43717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/base64-inl.h
Outdated
@@ -182,6 +211,96 @@ inline size_t base64_encode(const char* src, | |||
return dlen; | |||
} | |||
|
|||
|
|||
#if (defined(__x86_64) || defined(__x86_64__)) && \ | |||
(defined(__linux) || defined(__linux__)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why this operation wouldn't apply outside of Linux? Should this be a check for gcc/clang instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original code is for Linux GCC and I do not have a windows to test for it. I would try to remove the line 216 and add gcc/clang check and let the bot to test it. Thanks!
|
||
#if (defined(__x86_64) || defined(__x86_64__)) && \ | ||
(defined(__linux) || defined(__linux__)) | ||
#pragma GCC target("avx512vl", "avx512vbmi") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be a bit cleaner to use __attribute__((target(...)))
on the function itself? Or is this necessary for the #include to work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose #pragma so to explicitly reset the pragma to avoid polution on other code.
FWIW it might be better to go with something that is not so exclusive platform/CPU feature-wise. Awhile back I submitted #39775 which uses https://github.com/aklomp/base64 for broader platform/CPU support. I'll try and work on bringing that PR closer to being landable. |
Optimized the Base64 encoding by AVX512VL and AVX512VBMI instructions. The measurement of benchmark/buffers/buffer-base64-encoding on AWS EC2 m6i.large shows 2.4X performance (+140% gain). This optimization only applies to Linux X86_64 at present.
The considerations were that: AVX512VL is not a new technology (after Cannonlake in 2018) and the main computing intensive AWS EC2 instances are based on Icelake. The users even care about the performance difference of Base64 encoding may mainly work on new arch machine and have requirements of top performance of Node.js. So the purpose of this optimization is to provide the fastest option for base64 encoding with least code change for those users's requirement. AFAIK WojciechMula's AVX512VL solution is the fastest algorithm which is faster than AVX512F/AVX512VBMI. AKLOMP's solution is good to support a variaty platforms but lack of the top performance option so I did not borrow that solution in this optimization. |
Resubmit to modify the first commit msg. |
Optimized the Base64 encoding by AVX512VL instructions. Purpose of this opt is not to provide a generic solution for variety of platforms, but for modern cpu archs after Cannonlake which has been widly employeed by AWS and other CSPs and for the users who want to obtain the known best performance of Node.js. The measurement of buffer-base64-encoding on AWS EC2 m6i.large instance shows 2.4X performance (+140% gain). This opt only applies to X86_64 with GNU compiler at present.
Optimized the Base64 encoding by AVX512VL and AVX512VBMI instructions. The measurement of benchmark/buffers/buffer-base64-encoding on AWS EC2 m6i.large shows 2.4X performance (+140% gain). This optimization only applies to Linux X86_64 at present.
The algorithm is based on open project of https://github.com/WojciechMula/base64-avx512 with some modifications to fit in Node.js code base. I have added the original BSD-3 license in the patch.