Add optimized crc32 for Power 8+ processors #478

mscastanho · 2020-03-12T13:56:51Z

Hi @madler and zlib community,

This is an updated version of @grooverdan 's PR #335. It includes the following changes:

Rebase against develop branch
Use new ifunc infrastructure from other Power-related PRs
Fix warnings on crc32_test.c when compiling with -Wall
Support for CMake
Match zlib's use of unsigned long on crc32
Dropped clang support (see below)

Regarding clang, it is not safe to call getauxval() (and other glibc functions) from inside an ifunc resolver, so we ended up using __builtin_cpu_supports for feature detection. This builtin is not currently supported by clang on Power. When using clang, the code will compile fine, but without the optimizations. Still, some of the workarounds for clang throughout the code were left as they seem harmless for other compilers.

To measure the performance improvement, we used Chromium's zlib_bench.cc with input files from jsnell/zlib-bench.

The results below show decompression throughput in MB/s using gzip wrapper, for all compression levels:

pngpixels

comp lvl	default	crc32-power	gain
1	290.5	375.4	+29.23%
2	311.5	411.7	+32.17%
3	349.5	482.3	+38.00%
4	254.2	317.1	+24.74%
5	273.9	349.4	+27.56%
6	313.7	417.6	+33.12%
7	330.8	448.3	+35.52%
8	365.5	514.7	+40.82%
9	383.6	551.7	+43.82%

jpeg

comp lvl	default	crc32-power	gain
1	406.3	619.3	+52.42%
2	361.0	520.5	+44.18%
3	369.7	539.7	+45.98%
4	742.4	1938.7	+161.14%
5	740.2	1961.9	+165.05%
6	740.3	1950.7	+163.50%
7	737.9	1956.8	+165.18%
8	731.2	1936.4	+164.82%
9	743.2	1961.5	+163.93%

executable

comp lvl	default	crc32-power	gain
1	210.4	251.7	+19.63%
2	217.0	261.2	+20.37%
3	224.6	272.7	+21.42%
4	212.6	255.7	+20.27%
5	216.9	261.8	+20.70%
6	222.8	270.5	+21.41%
7	224.5	272.5	+21.38%
8	227.1	277.1	+22.02%
9	227.7	277.4	+21.83%

html

comp lvl	default	crc32-power	gain
1	237.1	288.9	+21.85%
2	245.9	302.0	+22.81%
3	256.4	318.0	+24.02%
4	247.3	306.0	+23.74%
5	245.5	302.9	+23.38%
6	250.1	309.9	+23.91%
7	250.2	309.9	+23.86%
8	250.6	310.3	+23.82%
9	250.4	309.9	+23.76%

This commit adds an optimized version of the crc32 function based on crc32-vpmsum from https://github.com/antonblanchard/crc32-vpmsum/ . The code has been relicensed to the zlib license. This is the C implementation created by Rogerio Alves <rogealve@br.ibm.com> It makes use of vector instructions to speed up CRC32 algorithm. Decompression times were improved by +30% on tests. Based on Daniel Black's work for the original zlib (madler/zlib#478).

fweimer-rh · 2021-12-13T14:13:44Z

contrib/power/crc32_z_resolver.c

+
+Z_IFUNC(crc32_z) {
+#ifdef Z_POWER8
+    if (__builtin_cpu_supports("arch_2_07"))


As discussed off-PR, I think this needs to check for VCRYPTO.

ljavorsk · 2022-04-06T07:09:20Z

Hi @mscastanho, we are just rebasing the new release from zlib on Fedora and RHEL and this PR is part of our downstream patches.
However, after the new release was released, it brings a lot of new conflicts.

Is there any way you are working on fixing these conflicts?

If not we are not sure we can backport this anymore due to the complicity of this PR.

Looking forward to your reply.
Thank you for your contribution :)

mscastanho · 2022-04-06T12:47:02Z

@ljavorsk I tried a local rebase and couldn't see any conflicts. Based on the Fedora zlib repo I suspect you're using the old patch from #335. In which case I'd suggest to replace it with this PR, which has some other fixes.

Optimized functions for Power will make use of GNU indirect functions, an extension to support different implementations of the same function, which can be selected during runtime. This will be used to provide optimized functions for different processor versions. Since this is a GNU extension, we placed the definition of the Z_IFUNC macro under `contrib/gcc`. This can be reused by other archs as well. Author: Matheus Castanho <msc@linux.ibm.com> Author: Rogerio Alves <rcardoso@linux.ibm.com>

This commit adds an optimized version for the crc32 function based on crc32-vpmsum from https://github.com/antonblanchard/crc32-vpmsum/ This is the C implementation created by Rogerio Alves <rogealve@br.ibm.com> It makes use of vector instructions to speed up CRC32 algorithm.

Clang 7 changed the behavior of vec_xxpermdi in order to match GCC's behavior. After this change, code that used to work on Clang 6 stopped to work on Clang >= 7. Tested on Clang 6, 7, 8 and 9. Reference: https://bugs.llvm.org/show_bug.cgi?id=38192 Signed-off-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>

mscastanho · 2022-04-06T12:59:23Z

Rebased on top of current develop.

ljavorsk · 2022-04-07T06:58:54Z

Thanks @mscastanho. So you say this is the same but improved patch?
Or does it have some other differences we should know about?

Because the Fedora patch that you've linked was part of a customer request done by your IBM colleagues.
You can see it here: https://bugzilla.redhat.com/show_bug.cgi?id=1666798

mscastanho · 2022-04-07T12:01:41Z

So you say this is the same but improved patch?

@ljavorsk Exactly. In terms of optimization, it's the same code. What changed is now it's also enabled when building with cmake, the ifunc usage was improved and there are a few other minor fixes (complete list on the PR description).

Should we open a new request to replace that with this new patch then?

ljavorsk · 2022-04-07T13:01:47Z

@mscastanho That's a great idea to open a new request, it would help us track it and we can also have some description there for future reference.

Either way, thank you for the explanation.

ljavorsk · 2022-10-17T09:12:28Z

Hi @mscastanho ,

Could you please also rebase the patch on top of the new zlib-1.2.13 version?

Thank you so much :)

tuliom · 2022-10-19T12:07:59Z

Could you please also rebase the patch on top of the new zlib-1.2.13 version?

@RajalakshmiSR, considering that @mscastanho does not have access to POWER servers anymore, could you give a hand with this, please?
This is important for Fedora and other distros.

Neustradamus · 2022-10-26T04:15:27Z

To follow.

RajalakshmiSR · 2022-11-09T13:58:22Z

@mmatti-sw Can you help to rebase?

mmatti-sw · 2022-11-21T15:45:14Z

I have created #750 the old patches are now rebased to v1.2.13.

grooverdan mentioned this pull request Mar 13, 2020

Optimized CRC32 framework with Power crc32 c optimized function #335

Closed

mscastanho mentioned this pull request Jun 10, 2020

Add new crc32 tests and optimized crc32 for POWER zlib-ng/zlib-ng#637

Merged

mscastanho force-pushed the crc32-power branch from 6ca5013 to 6d4c1fd Compare December 13, 2021 13:56

fweimer-rh reviewed Dec 13, 2021

View reviewed changes

Rogerio Alves and others added 3 commits April 6, 2022 09:50

mscastanho force-pushed the crc32-power branch from 6d4c1fd to 58ad234 Compare April 6, 2022 12:58

er-1 mentioned this pull request Nov 9, 2022

Update zlib on AT next advancetoolchain/advance-toolchain#3118

Closed

mmatti-sw mentioned this pull request Nov 21, 2022

Add optimized crc32 for Power 8+ processors #750

Open

Neustradamus mentioned this pull request Aug 23, 2023

IBM Power Processors and Zlib #847

Open

mmatti-sw mentioned this pull request Sep 15, 2023

Add optimized crc32 for Power 8+ processors #857

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add optimized crc32 for Power 8+ processors #478

Add optimized crc32 for Power 8+ processors #478

mscastanho commented Mar 12, 2020

fweimer-rh Dec 13, 2021

ljavorsk commented Apr 6, 2022

mscastanho commented Apr 6, 2022

mscastanho commented Apr 6, 2022

ljavorsk commented Apr 7, 2022

mscastanho commented Apr 7, 2022

ljavorsk commented Apr 7, 2022

ljavorsk commented Oct 17, 2022

tuliom commented Oct 19, 2022

Neustradamus commented Oct 26, 2022

RajalakshmiSR commented Nov 9, 2022

mmatti-sw commented Nov 21, 2022

Add optimized crc32 for Power 8+ processors #478

Are you sure you want to change the base?

Add optimized crc32 for Power 8+ processors #478

Conversation

mscastanho commented Mar 12, 2020

fweimer-rh Dec 13, 2021

Choose a reason for hiding this comment

ljavorsk commented Apr 6, 2022

mscastanho commented Apr 6, 2022

mscastanho commented Apr 6, 2022

ljavorsk commented Apr 7, 2022

mscastanho commented Apr 7, 2022

ljavorsk commented Apr 7, 2022

ljavorsk commented Oct 17, 2022

tuliom commented Oct 19, 2022

Neustradamus commented Oct 26, 2022

RajalakshmiSR commented Nov 9, 2022

mmatti-sw commented Nov 21, 2022