Skip to content

bpo-30102: Improve libssl performance on POWER8 for, e.g sha256 #1181

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

gut
Copy link

@gut gut commented Apr 19, 2017

To correctly pick the best algorithm for the current architecture,
libssl needs to have OPENSSL_config(NULL) called as described on:
https://wiki.openssl.org/index.php/Libcrypto_API

This short change lead to a speedup of 50% on POWER8 when using
hashlib.sha256 digest functionality as it now uses a SIMD approach that
was already existing but not used by cpython.

https://bugs.python.org/issue30102

To correctly pick the best algorithm for the current architecture,
libssl needs to have OPENSSL_config(NULL) called as described on:
https://wiki.openssl.org/index.php/Libcrypto_API

This short change lead to a speedup of 50% on POWER8 when using
hashlib.sha256 digest functionality as it now uses a SIMD approach that
was already existing but not used by cpython.
@mention-bot
Copy link

@gut, thanks for your PR! By analyzing the history of the files in this pull request, we identified @tiran, @loewis and @teoliphant to be potential reviewers.

@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately our records indicate you have not signed the CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

Thanks again to your contribution and we look forward to looking at it!

@gut
Copy link
Author

gut commented Apr 19, 2017

Can anybody support me on this AppVeyor build fail? It looks like on windows build it's not linking because it doesn't find _OPENSSL_config symbol. Should I #ifdef the block to linux only or what?

@pitrou
Copy link
Member

pitrou commented Apr 19, 2017

Apparently OPENSSL_config is deprecated and has been mistakenly removed from Windows builds, see openssl/openssl#684 and openssl/openssl@c35f5c3.

Perhaps @zware or @pfmoore can chime in on whether our current OpenSSL version on Windows is subject to this issue. Regardless, the aforelinked changeset shows how to replace OPENSSL_config with the newer idiom?

*
* Source: https://wiki.openssl.org/index.php/Libcrypto_API
*/
OPENSSL_config(NULL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to see this call in PyInit__hashlib() after ERR_load_crypto_strings().

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! I tested and I still have the results I expected.
However OPENSSL_config(NULL); is deprecated so I'll send this change as soon as I figure it out which interface should I use instead.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@gut
Copy link
Author

gut commented Apr 20, 2017

@pitrou : I found a useful documentation talking about the deprecated interface:
https://github.com/openssl/openssl/blob/cda3ae5bd0798c56fef5a5c1462d51ca1776504e/doc/crypto/OPENSSL_config.pod#notes

However I didn't have the same results when using CONF_modules_load_file(NULL, NULL, 0); instead. I'm checking it...

Both solutions lead to a performance improvement on POWER8 when using
sha digest
@gut
Copy link
Author

gut commented Apr 20, 2017

BTW, is there a deinitialization hook in order to call ENGINE_cleanup();? As a python user, I don't recall such a feature however there might be something on cpython's core... like after the whole module was garbage collected?
https://wiki.openssl.org/index.php/Manual:Engine(3)

@gut
Copy link
Author

gut commented Apr 20, 2017

Another question: I thought it would fix windows build but now some other symbols are not being found at link time but I don't see it related to my change. Does anybody have a hint?

@pitrou
Copy link
Member

pitrou commented Apr 20, 2017

The requirement to call ENGINE_cleanup() sounds a bit surprising. Wouldn't the "engines" simply be deallocated at process exit?

@pitrou
Copy link
Member

pitrou commented Apr 20, 2017

I thought it would fix windows build but now some other symbols are not being found at link time

Those seem to be certificate handling function symbols but the module being built is hashlib, why is hashlib using those functions?

@pitrou
Copy link
Member

pitrou commented Apr 20, 2017

As for the Windows issue, perhaps you can try applying the following diff:

diff --git a/PCbuild/_hashlib.vcxproj b/PCbuild/_hashlib.vcxproj
index b1300cb..704527c 100644
--- a/PCbuild/_hashlib.vcxproj
+++ b/PCbuild/_hashlib.vcxproj
@@ -64,7 +64,7 @@
       <AdditionalIncludeDirectories>$(opensslIncludeDir);%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
     </ClCompile>
     <Link>
-      <AdditionalDependencies>ws2_32.lib;$(OutDir)libeay$(PyDebugExt).lib;$(OutDir)ssleay$(PyDebugExt).lib;%(AdditionalDependencies)</AdditionalDependencies>
+      <AdditionalDependencies>ws2_32.lib;crypt32.lib;$(OutDir)libeay$(PyDebugExt).lib;$(OutDir)ssleay$(PyDebugExt).lib;%(AdditionalDependencies)</AdditionalDependencies>
     </Link>
   </ItemDefinitionGroup>
   <ItemGroup>

If it doesn't work, perhaps @zware, @pfmoore or @zooba (our resident Windows experts) can give a hand to debug this.

@gut
Copy link
Author

gut commented Apr 20, 2017

As for the Windows issue, perhaps you can try applying the following diff:

it worked! Thanks.

@pitrou
Copy link
Member

pitrou commented Apr 20, 2017

Great! Perhaps @Haypo and @tiran give the PR another look now that it passes the CI tests.

@pitrou
Copy link
Member

pitrou commented Apr 21, 2017

@gut, do you know whether this issue only applies to POWER8, or does it also affect e.g. AES performance on x86?

@gut
Copy link
Author

gut commented Apr 24, 2017

@pitrou This would be a good question to an OpenSSL expert. Well, I'll try some benchmarking on X86 and come back to you. Hang on...

@gut
Copy link
Author

gut commented Apr 24, 2017

@pitrou I am more focused on hashing and I didn't notice different instructions or performance related to python or from openssl directly on X86.
As for AES, could you benchmark it with the change? I don't have a benchmark environment set and I can't spend the time for doing that at the moment.

@gut
Copy link
Author

gut commented Apr 25, 2017

@Haypo @tiran : ping. Any other comments?

@pitrou
Copy link
Member

pitrou commented Apr 25, 2017

@gut, I am seeing around 760 MB/s here (Core i5-2500K), with and without the patch.
The script I'm using is here: https://gist.github.com/pitrou/0c3586abb17d6a00853509e0afda9a68, perhaps you want to give it a quick try on POWER8.

@gut
Copy link
Author

gut commented Apr 25, 2017

@pitrou Hi. On my POWER8 I also noticed ~750 MB/s with and without patch.

Interestingly the python version 3.5.3 found on Ubuntu 17.04 outputs ~860 MB/s. Maybe it's just my build but I didn't expect a performance drop.

Thanks for the script

@tiran
Copy link
Member

tiran commented Apr 25, 2017

Did you test your patch with OpenSSL 1.1.0? With 1.1.0, OpenSSL should automatically initialize all subsystems including the engines, https://www.openssl.org/docs/man1.1.0/crypto/OPENSSL_init_crypto.html

#ifndef OPENSSL_VERSION_1_1
OpenSSL_add_all_digests();
ERR_load_crypto_strings();
ENGINE_load_builtin_engines();
#endif

@gut
Copy link
Author

gut commented Apr 25, 2017

Very interesting. I was testing with libssl1.0.0 version 1.0.2g-1ubuntu11.

However besides init and deinit performed automatically as stated on:
As of version 1.1.0 OpenSSL will automatically allocate all resources that it needs so no explicit initialisation is required. Similarly it will also automatically deinitialise as required.

It also states that some initialization are only performed explicitly. E.g: (emphasis added by me)
OPENSSL_INIT_ENGINE_ALL_BUILTIN
With this option the library will automatically load and initialise all the built in engines listed above with the exception of the openssl and dasync engines. This not a default option.

Which looks like what I'm trying to initialize on this PR.

I'd suggest to update these initialization interfaces of OpenSSL 1.1.0 on another PR, what about it? It looks like there are other initialization aspects to be considered.

@tiran
Copy link
Member

tiran commented Apr 26, 2017

I don't think that OPENSSL_INIT_ENGINE_ALL_BUILTIN is a good idea, either. It loads too many engines. Hashing and crypto accelerator engines are covered by OPENSSL_INIT_ADD_ALL_CIPHERS and OPENSSL_INIT_ADD_ALL_DIGEST, both are default. The ALL_BUILTIN option also loads and potentially enables RDRAND on Intel, CRYPTODEV on BSD, PADLOCK on VIA CPUs, CAPI on Windows and more.

I don't trust RDRAND and so do countless people which more security experience than me. Before I agree on ENGINE_load_builtin_engines(), we must make sure that the call does not register any of the extra engines and set them as default provider for ciphers, digests, RSA, ECDSA, RNG and so on.

@gut
Copy link
Author

gut commented Apr 26, 2017

BTW, I was testing the new openssl v1.1 and I ran a short example without any of this special initialization (engines) and it still used the optimized version just like I wanted! So don't need to init an engine with OPENSSL_INIT_ENGINE_ALL_BUILTIN.

As for prior OpenSSL version, I'll check how can I avoid loading all engines and still having the optimized code for, e.g. sha256.

@tiran
Copy link
Member

tiran commented Apr 27, 2017

Awesome! Thanks for you hard work! :)

I'm sorry if my reviews sound harsh to you. Crypto and security is a mine field. OpenSSL is a pretty nasty and highly explosive mine field. As you have noticed, it is getting better with OpenSSL >= 1.1.0.

Replace ENGINE_load_builtin_engines to OPENSSL_cpuid_setup. It leads to
the performance gain on hashing on POWER8 platform.

I don't like the explicit function declaration but I didn't find it on
OpenSSL headers.
@gut
Copy link
Author

gut commented Apr 28, 2017

@tiran Don't worry. I see your reasons for being picky.

I analysed OpenSSL closely and I noticed that only the platform detection is important for me. I'll quote the related code below:

  1. Calling OPENSSL_cpuid_setup on ppc sets this HW detection flag: https://github.com/openssl/openssl/blob/master/crypto/ppccap.c#L342
  2. That flag can be used to call a more efficient implementation on POWER8 on hashing functions like sha256_block_data_order: https://github.com/openssl/openssl/blob/master/crypto/ppccap.c#L69

One drawback though is that I don't like to declare the function OPENSSL_cpuid_setup myself. How do you see it?

@gut
Copy link
Author

gut commented Jul 11, 2017

Hi. I just got my CLA approved (check on http://bugs.python.org/issue30102 , there is an * now next to my username).
How do we trigger the @the-knights-who-say-ni to recheck this?

I hope we can continue reviewing this now. I didn't ping before as I wanted to sign the CLA first.

Thanks in advance

Modules/_ssl.c Outdated
@@ -137,6 +138,8 @@ struct py_ssl_library_code {
#else /* OpenSSL < 1.1.0 */
#if defined(WITH_THREAD)
#define HAVE_OPENSSL_CRYPTO_LOCK
/* For some reason, this function is not declared on OpenSSL's headers */
void OPENSSL_cpuid_setup(void);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intended to only define this prototype if compiling with threads?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it was not. I just noticed it is inside of this definition. Great catch!
It is intended to be defined only when /* OpenSSL < 1.1.0 */.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks

Modules/_ssl.c Outdated
#endif

/* For some reason, this function is not declared on OpenSSL's headers */
void OPENSSL_cpuid_setup(void);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function guaranteed to be available on all versions of OpenSSL and LibreSSL that are supported by Python? At the moment it's 0.9.8 to 1.1.x and all versions of LibreSSL?

Copy link
Author

@gut gut Jul 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a nice way to verify that? I see that you have a project to compile a set of libressl and openssl versions.

Checking on OpenSSL, I notice that the commit that implemented this function is dated back to 2004 when 0.9.8 version didn't exist.

And on Libressl, it was added on 2008.

Would that be enough?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, isn't Travis CI already checking these things?

@gut
Copy link
Author

gut commented Aug 7, 2017

just solved the conflict: now the PCbuild/_hashlib.vcxproj file looks more similar to the approach of PCbuild/_ssl.vcxproj.

Is there anything else missing?

@tiran : please comment my previous reply to your review. Is that enough?

@gut
Copy link
Author

gut commented Sep 5, 2017

I guess we can close this PR as I just confirmed that the approved #3112 solves the issue I'm interested on.
Thanks you all for taking your time to handle this issue.

@gut gut closed this Sep 5, 2017
@gut gut deleted the improve-POWER8-performance-on-libssl branch September 5, 2017 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants