-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
bpo-30102: Improve libssl performance on POWER8 for, e.g sha256 #1181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
To correctly pick the best algorithm for the current architecture, libssl needs to have OPENSSL_config(NULL) called as described on: https://wiki.openssl.org/index.php/Libcrypto_API This short change lead to a speedup of 50% on POWER8 when using hashlib.sha256 digest functionality as it now uses a SIMD approach that was already existing but not used by cpython.
@gut, thanks for your PR! By analyzing the history of the files in this pull request, we identified @tiran, @loewis and @teoliphant to be potential reviewers. |
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA). Unfortunately our records indicate you have not signed the CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. Thanks again to your contribution and we look forward to looking at it! |
Can anybody support me on this AppVeyor build fail? It looks like on windows build it's not linking because it doesn't find |
Apparently Perhaps @zware or @pfmoore can chime in on whether our current OpenSSL version on Windows is subject to this issue. Regardless, the aforelinked changeset shows how to replace |
Modules/_hashopenssl.c
Outdated
* | ||
* Source: https://wiki.openssl.org/index.php/Libcrypto_API | ||
*/ | ||
OPENSSL_config(NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to see this call in PyInit__hashlib() after ERR_load_crypto_strings().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! I tested and I still have the results I expected.
However OPENSSL_config(NULL);
is deprecated so I'll send this change as soon as I figure it out which interface should I use instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@pitrou : I found a useful documentation talking about the deprecated interface: However I didn't have the same results when using |
Both solutions lead to a performance improvement on POWER8 when using sha digest
BTW, is there a deinitialization hook in order to call |
Another question: I thought it would fix windows build but now some other symbols are not being found at link time but I don't see it related to my change. Does anybody have a hint? |
The requirement to call |
Those seem to be certificate handling function symbols but the module being built is hashlib, why is hashlib using those functions? |
As for the Windows issue, perhaps you can try applying the following diff:
If it doesn't work, perhaps @zware, @pfmoore or @zooba (our resident Windows experts) can give a hand to debug this. |
it worked! Thanks. |
@gut, do you know whether this issue only applies to POWER8, or does it also affect e.g. AES performance on x86? |
@pitrou This would be a good question to an OpenSSL expert. Well, I'll try some benchmarking on X86 and come back to you. Hang on... |
@pitrou I am more focused on hashing and I didn't notice different instructions or performance related to python or from openssl directly on X86. |
@gut, I am seeing around 760 MB/s here (Core i5-2500K), with and without the patch. |
@pitrou Hi. On my POWER8 I also noticed ~750 MB/s with and without patch. Interestingly the python version 3.5.3 found on Ubuntu 17.04 outputs ~860 MB/s. Maybe it's just my build but I didn't expect a performance drop. Thanks for the script |
Did you test your patch with OpenSSL 1.1.0? With 1.1.0, OpenSSL should automatically initialize all subsystems including the engines, https://www.openssl.org/docs/man1.1.0/crypto/OPENSSL_init_crypto.html
|
Very interesting. I was testing with However besides init and deinit performed automatically as stated on: It also states that some initialization are only performed explicitly. E.g: (emphasis added by me) Which looks like what I'm trying to initialize on this PR. I'd suggest to update these initialization interfaces of OpenSSL 1.1.0 on another PR, what about it? It looks like there are other initialization aspects to be considered. |
I don't think that I don't trust |
BTW, I was testing the new openssl v1.1 and I ran a short example without any of this special initialization (engines) and it still used the optimized version just like I wanted! So don't need to init an engine with As for prior OpenSSL version, I'll check how can I avoid loading all engines and still having the optimized code for, e.g. sha256. |
Awesome! Thanks for you hard work! :) I'm sorry if my reviews sound harsh to you. Crypto and security is a mine field. OpenSSL is a pretty nasty and highly explosive mine field. As you have noticed, it is getting better with OpenSSL >= 1.1.0. |
Replace ENGINE_load_builtin_engines to OPENSSL_cpuid_setup. It leads to the performance gain on hashing on POWER8 platform. I don't like the explicit function declaration but I didn't find it on OpenSSL headers.
@tiran Don't worry. I see your reasons for being picky. I analysed OpenSSL closely and I noticed that only the platform detection is important for me. I'll quote the related code below:
One drawback though is that I don't like to declare the function |
Hi. I just got my CLA approved (check on http://bugs.python.org/issue30102 , there is an I hope we can continue reviewing this now. I didn't ping before as I wanted to sign the CLA first. Thanks in advance |
Modules/_ssl.c
Outdated
@@ -137,6 +138,8 @@ struct py_ssl_library_code { | |||
#else /* OpenSSL < 1.1.0 */ | |||
#if defined(WITH_THREAD) | |||
#define HAVE_OPENSSL_CRYPTO_LOCK | |||
/* For some reason, this function is not declared on OpenSSL's headers */ | |||
void OPENSSL_cpuid_setup(void); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it intended to only define this prototype if compiling with threads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it was not. I just noticed it is inside of this definition. Great catch!
It is intended to be defined only when /* OpenSSL < 1.1.0 */
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks
Modules/_ssl.c
Outdated
#endif | ||
|
||
/* For some reason, this function is not declared on OpenSSL's headers */ | ||
void OPENSSL_cpuid_setup(void); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this function guaranteed to be available on all versions of OpenSSL and LibreSSL that are supported by Python? At the moment it's 0.9.8 to 1.1.x and all versions of LibreSSL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a nice way to verify that? I see that you have a project to compile a set of libressl and openssl versions.
Checking on OpenSSL, I notice that the commit that implemented this function is dated back to 2004 when 0.9.8 version didn't exist.
And on Libressl, it was added on 2008.
Would that be enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, isn't Travis CI already checking these things?
just solved the conflict: now the Is there anything else missing? @tiran : please comment my previous reply to your review. Is that enough? |
I guess we can close this PR as I just confirmed that the approved #3112 solves the issue I'm interested on. |
To correctly pick the best algorithm for the current architecture,
libssl needs to have OPENSSL_config(NULL) called as described on:
https://wiki.openssl.org/index.php/Libcrypto_API
This short change lead to a speedup of 50% on POWER8 when using
hashlib.sha256 digest functionality as it now uses a SIMD approach that
was already existing but not used by cpython.
https://bugs.python.org/issue30102