Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate modern optimized zlib #416

Closed
gdevenyi opened this issue Jan 16, 2019 · 58 comments
Closed

Integrate modern optimized zlib #416

gdevenyi opened this issue Jan 16, 2019 · 58 comments
Assignees
Labels
area:ThirdParty Issues affecting the ThirdParty module

Comments

@gdevenyi
Copy link
Contributor

Related to #348 zlib-ng is a project attempting to modernize the zlib codebase:
https://github.com/zlib-ng/zlib-ng

@thewtex
Copy link
Member

thewtex commented Jan 16, 2019

Also pigz (a parallel version of zlib):

https://zlib.net/pigz/

@hjmjohnson
Copy link
Member

I've heard good things about pigz (Chris Rorden is a big fan). I'm initially a bit concerned that the source code is only available as a tarball from 2 years ago.

It also feels like it may work well in posix environments, but I don't see how well it is supported across platforms.

@neurolabusc
Copy link

You might find the my comparison of different GZ compression strategies relevant. For that test I intentionally used a slow hard disk. With zlib you can write a compressed file direct to disk. Traditionally, with pigz we need to write the raw data to disk and then pigz loads this file and compresses it. Therefore, one needs to wonder if the parallel performance of pigz can be offset by the disk IO. The link also shows that if you have a modern version of pigz on Unix you can use a named-pipe to avoid writing the whole file to a slow disk. In my experience, pigz works fine on Windows, though I don't think you can use the named-pipe trick. The fact that it has not changed much in the last years may just be a sign it is mature. The original deflate/gz format dates back to a time when memory was limited and multiple cores was exotic. It has achieved widespread use, but innovations will really come from new standards that leverage modern computers. zstd is extremely impressive, in particular for medical datasets when it is combined with a byte-shuffling filter.

In my testing, I always found Cloudflare's zlib faster than zlib-ng. However, this may have changed or may have been due to compiler settings. Cloudflare seems stuck at an older version of zlib (1.2.8).

@gdevenyi
Copy link
Contributor Author

Discussion regarding cloudflare's version compared to zlib-ng: zlib-ng/zlib-ng#42

@thewtex thewtex added the area:ThirdParty Issues affecting the ThirdParty module label Jan 24, 2019
@dzenanz dzenanz self-assigned this Apr 13, 2019
@stale
Copy link

stale bot commented Aug 11, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the status:Backlog Postponed without a fixed deadline label Aug 11, 2019
@stale stale bot closed this as completed Sep 10, 2019
@gdevenyi
Copy link
Contributor Author

Ping, still relevant

@neurolabusc
Copy link

@gdevenyi the page you cite compares zlib-ng to Cloudflare. It is useful, but I think it is a little out of date. More recent patches to Cloudflare provide good MacOS support. When I looked into this (perhaps a year ago) the benefit of Cloudflare was that it is faster. However, Cloudflare has two limitations: it will fail on old CPUs (more than 10 years old) and it has different licensing.

Specifically, to get optimal performance from CloudFlare you must build with a a bit of [GPL code]. (https://github.com/cloudflare/zlib/blob/gcc.amd64/contrib/amd64/crc32-pclmul_asm.S) which makes it incompatible with some projects. If you must retain the pure zlib license, you should stick with zlib-ng, build CloudFlare without requiring crc32-pclmul_asm.S or make a clean-room replacement for that code. The final option would not be too hard, as Intel provided open source sample code available on Github when they introduced these instructions.

@gdevenyi
Copy link
Contributor Author

Indeed @neurolabusc zlib-ng intends to retain the zlib licence, and pulls from all available zlib forks where licencing allows (intel, cloudflare, chromium, others?)

@thewtex thewtex reopened this Sep 24, 2019
@stale stale bot removed the status:Backlog Postponed without a fixed deadline label Sep 24, 2019
@stale
Copy link

stale bot commented Jan 22, 2020

This issue has been automatically marked as stale because it has not had recent activity. Thank you for your contributions.

@stale stale bot added the status:Backlog Postponed without a fixed deadline label Jan 22, 2020
@neurolabusc
Copy link

neurolabusc commented Jan 23, 2020

@gdevenyi I found that the Chrome BSD-licensed CRC SIMD code works at least as well as the GPL-license CRC. If you want, you can test my CloudFlare zlib clone where the GPL code has been replaced. If others validate this, I will make a pull request for the main repository. I think this will provide the performance of Cloudflare without the restrictive GPL license. This would allow us to use these libraries in many non-GPL tools in our field, like FSL.

I looked at zlib-ng, and it now also includes a SIMD CRC implementation. So I think it is time to do another comparison to see whether zlib-ng has pulled ahead of CloudFlare zlib.

This benefits all gz compression, but the effects are most profound in pigz, since CRC must be done in serial, the parallel pigz is fighting Amdahl's law.

@stale stale bot removed the status:Backlog Postponed without a fixed deadline label Jan 23, 2020
@gdevenyi
Copy link
Contributor Author

Wow this is great. The zlib-ng project is definitely putting together infrastructure for a nice clean modern zlib with multi-arch support and continuous integration. I haven't had a chance to look at your implementation details @neurolabusc, but I did dig up zlib-ng's CRC implementation commit, zlib-ng/zlib-ng@3684659

It looks like the implementation comes from an intel patched version: https://github.com/jtkukunas/zlib but they're both based on the same whitepaper.

I think the most important thing to check is if the intel version works properly on AMD...

@gdevenyi
Copy link
Contributor Author

I'm going to fork/update this: https://github.com/jsnell/zlib-bench add your version, and generate some comparisons.

@gdevenyi
Copy link
Contributor Author

gdevenyi commented Jan 24, 2020

Results:
gcc 9.2.1 on Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz

                     baseline               cloudflare             intel                  zlib-ng                zlib-ng-modern         rodren-zlib            

decompress executable (50 iterations)
 Execution time [s]:  1.25 ± 0.00 (100.00%)  1.08 ± 0.00 ( 86.61%)  0.99 ± 0.00 ( 79.30%)  1.04 ± 0.00 ( 83.56%)  1.03 ± 0.01 ( 82.31%)  1.07 ± 0.01 ( 85.98%) 
decompress html (50 iterations)
 Execution time [s]:  0.72 ± 0.00 (100.00%)  0.63 ± 0.01 ( 86.38%)  0.54 ± 0.01 ( 74.96%)  0.59 ± 0.01 ( 81.17%)  0.58 ± 0.00 ( 79.88%)  0.62 ± 0.01 ( 86.21%) 
decompress jpeg (50 iterations)
 Execution time [s]:  0.24 ± 0.00 (100.00%)  0.15 ± 0.01 ( 61.99%)  0.16 ± 0.01 ( 66.31%)  0.25 ± 0.01 (104.56%)  0.25 ± 0.01 (106.19%)  0.14 ± 0.01 ( 61.07%) 
decompress pngpixels (50 iterations)
 Execution time [s]:  0.82 ± 0.00 (100.00%)  0.64 ± 0.00 ( 78.00%)  0.59 ± 0.00 ( 71.72%)  0.64 ± 0.01 ( 77.66%)  0.64 ± 0.00 ( 77.10%)  0.63 ± 0.00 ( 77.06%) 

compress executable -1 (10 iterations)
  Compression ratio:  0.37                   0.37                   0.37                   0.46                   0.46                   0.37                  
 Execution time [s]:  0.84 ± 0.00 (100.00%)  0.50 ± 0.00 ( 59.66%)  0.57 ± 0.00 ( 68.02%)  0.28 ± 0.00 ( 34.06%)  0.29 ± 0.00 ( 34.11%)  0.50 ± 0.00 ( 59.46%) 
compress html -1 (10 iterations)
  Compression ratio:  0.39                   0.37                   0.37                   0.54                   0.54                   0.37                  
 Execution time [s]:  0.44 ± 0.00 (100.00%)  0.27 ± 0.00 ( 61.71%)  0.32 ± 0.00 ( 72.03%)  0.19 ± 0.00 ( 42.82%)  0.19 ± 0.00 ( 43.91%)  0.27 ± 0.00 ( 61.32%) 
compress jpeg -1 (10 iterations)
  Compression ratio:  1.00                   1.00                   1.00                   1.05                   1.05                   1.00                  
 Execution time [s]:  0.66 ± 0.00 (100.00%)  0.52 ± 0.00 ( 79.30%)  0.48 ± 0.00 ( 72.42%)  0.24 ± 0.00 ( 37.18%)  0.24 ± 0.00 ( 37.16%)  0.52 ± 0.00 ( 78.54%) 
compress pngpixels -1 (10 iterations)
  Compression ratio:  0.17                   0.17                   0.17                   0.23                   0.23                   0.17                  
 Execution time [s]:  0.48 ± 0.00 (100.00%)  0.26 ± 0.00 ( 53.56%)  0.31 ± 0.00 ( 64.09%)  0.18 ± 0.00 ( 36.38%)  0.17 ± 0.00 ( 36.04%)  0.26 ± 0.00 ( 53.60%) 

compress executable -3 (10 iterations)
  Compression ratio:  0.35                   0.36                   0.36                   0.36                   0.36                   0.36                  
 Execution time [s]:  1.16 ± 0.00 (100.00%)  0.61 ± 0.00 ( 52.09%)  0.69 ± 0.00 ( 59.27%)  0.71 ± 0.00 ( 61.34%)  0.71 ± 0.00 ( 61.01%)  0.61 ± 0.00 ( 52.11%) 
compress html -3 (10 iterations)
  Compression ratio:  0.36                   0.35                   0.35                   0.35                   0.35                   0.35                  
 Execution time [s]:  0.66 ± 0.00 (100.00%)  0.35 ± 0.00 ( 53.52%)  0.41 ± 0.00 ( 61.62%)  0.41 ± 0.00 ( 61.61%)  0.41 ± 0.00 ( 62.13%)  0.36 ± 0.00 ( 53.91%) 
compress jpeg -3 (10 iterations)
  Compression ratio:  1.00                   1.00                   1.00                   1.00                   1.00                   1.00                  
 Execution time [s]:  0.66 ± 0.00 (100.00%)  0.52 ± 0.00 ( 78.39%)  0.48 ± 0.00 ( 71.75%)  0.55 ± 0.00 ( 83.48%)  0.56 ± 0.00 ( 83.77%)  0.53 ± 0.00 ( 79.14%) 
compress pngpixels -3 (10 iterations)
  Compression ratio:  0.15                   0.15                   0.15                   0.15                   0.15                   0.15                  
 Execution time [s]:  0.86 ± 0.00 (100.00%)  0.43 ± 0.00 ( 49.34%)  0.55 ± 0.00 ( 63.30%)  0.41 ± 0.00 ( 47.90%)  0.42 ± 0.00 ( 48.49%)  0.43 ± 0.00 ( 49.32%) 

compress executable -5 (10 iterations)
  Compression ratio:  0.33                   0.34                   0.34                   0.33                   0.33                   0.34                  
 Execution time [s]:  1.65 ± 0.00 (100.00%)  0.89 ± 0.00 ( 54.23%)  0.86 ± 0.00 ( 52.43%)  0.95 ± 0.00 ( 57.87%)  0.96 ± 0.00 ( 58.14%)  0.89 ± 0.00 ( 53.80%) 
compress html -5 (10 iterations)
  Compression ratio:  0.34                   0.33                   0.33                   0.34                   0.34                   0.33                  
 Execution time [s]:  0.99 ± 0.00 (100.00%)  0.55 ± 0.00 ( 56.11%)  0.50 ± 0.00 ( 50.73%)  0.62 ± 0.01 ( 62.79%)  0.61 ± 0.00 ( 62.18%)  0.55 ± 0.00 ( 56.03%) 
compress jpeg -5 (10 iterations)
  Compression ratio:  1.00                   1.00                   1.00                   1.00                   1.00                   1.00                  
 Execution time [s]:  0.68 ± 0.00 (100.00%)  0.55 ± 0.00 ( 80.59%)  0.63 ± 0.00 ( 92.32%)  0.69 ± 0.00 (101.92%)  0.69 ± 0.00 (102.23%)  0.54 ± 0.00 ( 79.95%) 
compress pngpixels -5 (10 iterations)
  Compression ratio:  0.14                   0.14                   0.14                   0.14                   0.14                   0.14                  
 Execution time [s]:  1.20 ± 0.00 (100.00%)  0.59 ± 0.00 ( 49.31%)  0.68 ± 0.00 ( 56.59%)  0.72 ± 0.00 ( 60.11%)  0.72 ± 0.00 ( 60.34%)  0.59 ± 0.00 ( 49.06%) 

compress executable -9 (10 iterations)
  Compression ratio:  0.33                   0.33                   0.33                   0.33                   0.33                   0.33                  
 Execution time [s]:  9.20 ± 0.01 (100.00%)  3.95 ± 0.01 ( 42.95%)  7.19 ± 0.00 ( 78.11%)  6.76 ± 0.00 ( 73.41%)  6.78 ± 0.00 ( 73.65%)  3.95 ± 0.00 ( 42.96%) 
compress html -9 (10 iterations)
  Compression ratio:  0.33                   0.33                   0.33                   0.33                   0.33                   0.33                  
 Execution time [s]:  2.71 ± 0.00 (100.00%)  1.57 ± 0.00 ( 57.88%)  2.41 ± 0.01 ( 88.89%)  2.25 ± 0.00 ( 83.17%)  2.26 ± 0.00 ( 83.38%)  1.57 ± 0.00 ( 57.82%) 
compress jpeg -9 (10 iterations)
  Compression ratio:  1.00                   1.00                   1.00                   1.00                   1.00                   1.00                  
 Execution time [s]:  0.68 ± 0.00 (100.00%)  0.55 ± 0.00 ( 81.08%)  0.51 ± 0.00 ( 75.88%)  0.57 ± 0.00 ( 84.13%)  0.58 ± 0.00 ( 85.86%)  0.54 ± 0.00 ( 80.15%) 
compress pngpixels -9 (10 iterations)
  Compression ratio:  0.12                   0.12                   0.12                   0.12                   0.12                   0.12                  
 Execution time [s]: 24.84 ± 0.04 (100.00%) 13.80 ± 0.02 ( 55.54%) 20.54 ± 0.03 ( 82.69%) 18.41 ± 0.07 ( 74.12%) 18.46 ± 0.04 ( 74.32%) 13.82 ± 0.01 ( 55.63%)

@gdevenyi
Copy link
Contributor Author

Followup:
On my (sadly old) compute cluster with Intel(R) Xeon(R) CPU E5645 @ 2.40GHz both cloudflare and rordenlab/zlib crash with illegal instruction (despite being built locally on the machine), so something we need to worry about is how flexible the patches/builds are to detecting supported CPU flags and gracefully degrading. zlib-ng guys have a nice modern cmake-based build which helps with this.

@neurolabusc
Copy link

neurolabusc commented Jan 24, 2020

@gdevenyi - curious, the E5645 should be the first generation to support CLMUL instructions. I had realized that my CMake needed a bit more work (e.g. 32-bit operating systems), but your system is perfect as it is right on the cusp of supporting these functions. Since my solution uses the Chrome browser code, I will look at their code to detect this. I agree, zlib-ng has a great CMake. At the very least, we should improve the CloudFlare CMake, though it may prove that recent updates make zlib-ng a better overall choice than Cloudflare.

1.) Can you include the output of the zlib-ng CMake on your E5645, in particular the following (I am interested to see if it enables PCLMULQDQ). Also, can you tell me if your E5645 operating system is set up as 32 or 64-bit:

-- The following features have been enabled:
 * CMAKE_BUILD_TYPE, Build type: Release (default)
 * WITH_OPTIM, Build with optimisation
 * WITH_NEW_STRATEGIES, Use new strategies
 * SSE42_CRC, Support CRC hash generation using the SSE4.2 instruction set, using "-msse4"
 * SSE42_DEFLATE_QUICK, Support SSE4.2-accelerated quick compression
 * PCLMUL_CRC, Support CRC hash generation using PCLMULQDQ, using "-mpclmul"

@gdevenyi
Copy link
Contributor Author

  1. Updated zlib-bench code up here: https://github.com/gdevenyi/zlib-bench/tree/update-benchmarks
  2. 64bit, ubuntu 14.04.5
-- Using CMake version 3.13.1
-- ZLIB_HEADER_VERSION: 1.2.11
-- ZLIBNG_HEADER_VERSION: 1.9.9
-- The C compiler identification is GNU 9.1.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Arch detected: 'x86_64'
-- Basearch of 'x86_64' has been detected as: 'x86'
-- Looking for sys/types.h
-- Looking for sys/types.h - found
-- Looking for stdint.h
-- Looking for stdint.h - found
-- Looking for stddef.h
-- Looking for stddef.h - found
-- Looking for sys/sdt.h
-- Looking for sys/sdt.h - not found
-- Check size of off64_t
-- Check size of off64_t - done
-- Looking for fseeko
-- Looking for fseeko - found
-- Looking for strerror
-- Looking for strerror - found
-- Looking for unistd.h
-- Looking for unistd.h - found
-- Performing Test HAVE_ATTRIBUTE_VISIBILITY_HIDDEN
-- Performing Test HAVE_ATTRIBUTE_VISIBILITY_HIDDEN - Success
-- Performing Test HAVE_ATTRIBUTE_VISIBILITY_INTERNAL
-- Performing Test HAVE_ATTRIBUTE_VISIBILITY_INTERNAL - Success
-- Performing Test HAVE_BUILTIN_CTZL
-- Performing Test HAVE_BUILTIN_CTZL - Success
-- Performing Test HAVE_PTRDIFF_T
-- Performing Test HAVE_PTRDIFF_T - Success
-- Performing Test HAVE_SSE2_INTRIN
-- Performing Test HAVE_SSE2_INTRIN - Success
-- Performing Test HAVE_SSE42CRC_INLINE_ASM
-- Performing Test HAVE_SSE42CRC_INLINE_ASM - Success
-- Performing Test HAVE_SSE42CRC_INTRIN
-- Performing Test HAVE_SSE42CRC_INTRIN - Success
-- Performing Test HAVE_PCLMULQDQ_INTRIN
-- Performing Test HAVE_PCLMULQDQ_INTRIN - Failed
-- Architecture-specific source files: arch/x86/x86.c;arch/x86/insert_string_sse.c;arch/x86/deflate_quick.c;arch/x86/fill_window_sse.c;arch/x86/slide_sse.c
-- Renaming
--     /home/cic/devgab/projects/src/zlib-bench/zlib.zlib-ng/zconf-ng.h
-- to 'zconf-ng.h.included' because this file is included with zlib
-- but CMake generates it automatically in the build directory.
-- The following features have been enabled:

 * CMAKE_BUILD_TYPE, Build type: Release (default)
 * WITH_OPTIM, Build with optimisation
 * WITH_NEW_STRATEGIES, Use new strategies
 * SSE42_CRC, Support CRC hash generation using the SSE4.2 instruction set, using "-msse4"
 * SSE42_DEFLATE_QUICK, Support SSE4.2-accelerated quick compression

-- The following features have been disabled:

 * ZLIB_COMPAT, Provide a zlib-compatible API
 * WITH_GZFILEOP, Compile with support for gzFile-related functions
 * WITH_SANITIZERS, Build with address sanitizer and all supported sanitizers other than memory sanitizer
 * WITH_MSAN, Build with memory sanitizer
 * WITH_FUZZERS, Build test/fuzz
 * MAINTAINER, Build with maintainer warnings and tests enabled

-- Configuring done
-- Generating done
-- Build files have been written to: /home/cic/devgab/projects/src/zlib-bench/build

@neurolabusc
Copy link

  1. Sorry, did not read your previous mention of the benchmark source. Mea culpa Thanks!

  2. Thanks! The zlib-ng CMake has a clever test. I will adopt this with attribution. The crucial line is:

-- Performing Test HAVE_PCLMULQDQ_INTRIN - Failed

@neurolabusc
Copy link

neurolabusc commented Jan 25, 2020

  1. @gdevenyi I have updated my project, can you please test that it correctly detects that your E5645 is unable to use the accelerated CRC.

  2. Here is performance of my i7-8750H laptop. I tested two NIfTI images: a 16-bit integer raw ASL series, as well as the same image brain-extracted and blurred saved to 32-bit float (simulating processed data). The intel/zlib-ng wins by a modest amount for decompression, the -1 compression level is apples to oranges comparison as intel/zlib compress faster but less effectively, Cloudflare really excels at compressing at the standard and higher compression levels. I do want to say that I am really impressed with the zlib-ng build script. Cloudflare's build script stays close to the original zlib, but CloudFlare-ng does a great job of detecting hardware. My vote would still be for Cloudflare for x86-64 systems, but I suspect zlib-ng is optimized better for other architectures.

                     baseline               cloudflare             intel                  zlib-ng-compat         zlib-ng                rorden-zlib            
decompress asl16.nii (50 iterations)
 Execution time [s]:  5.19 ± 0.02 (100.00%)  4.59 ± 0.01 ( 88.43%)  4.22 ± 0.01 ( 81.36%)  4.24 ± 0.00 ( 81.61%)  4.30 ± 0.01 ( 82.79%)  4.61 ± 0.04 ( 88.90%) 
decompress asl32.nii (50 iterations)
 Execution time [s]:  5.65 ± 0.00 (100.00%)  4.50 ± 0.00 ( 79.70%)  4.83 ± 0.00 ( 85.47%)  4.35 ± 0.00 ( 77.05%)  4.29 ± 0.00 ( 76.04%)  4.49 ± 0.00 ( 79.49%) 
compress asl16.nii -1 (10 iterations)
  Compression ratio:  0.57                   0.57                   0.57                   0.77                   0.77                   0.57                  
 Execution time [s]:  3.14 ± 0.01 (100.00%)  2.22 ± 0.02 ( 70.58%)  2.48 ± 0.03 ( 78.91%)  1.10 ± 0.00 ( 35.15%)  1.12 ± 0.00 ( 35.48%)  2.21 ± 0.00 ( 70.27%) 
compress asl32.nii -1 (10 iterations)
  Compression ratio:  0.44                   0.44                   0.44                   0.49                   0.49                   0.44                  
 Execution time [s]:  5.97 ± 0.06 (100.00%)  3.04 ± 0.04 ( 50.97%)  3.43 ± 0.04 ( 57.51%)  1.38 ± 0.02 ( 23.10%)  1.41 ± 0.03 ( 23.58%)  3.07 ± 0.04 ( 51.52%) 
compress asl16.nii -3 (10 iterations)
  Compression ratio:  0.55                   0.54                   0.54                   0.55                   0.55                   0.54                  
 Execution time [s]:  5.28 ± 0.02 (100.00%)  3.23 ± 0.05 ( 61.21%)  3.73 ± 0.01 ( 70.70%)  3.62 ± 0.00 ( 68.70%)  3.64 ± 0.01 ( 69.10%)  3.16 ± 0.01 ( 59.97%) 
compress asl32.nii -3 (10 iterations)
  Compression ratio:  0.44                   0.44                   0.44                   0.43                   0.43                   0.44                  
 Execution time [s]:  5.91 ± 0.00 (100.00%)  3.00 ± 0.01 ( 50.82%)  3.31 ± 0.00 ( 56.00%)  3.95 ± 0.01 ( 66.77%)  4.00 ± 0.01 ( 67.66%)  3.00 ± 0.01 ( 50.85%) 
compress asl16.nii -5 (10 iterations)
  Compression ratio:  0.55                   0.54                   0.54                   0.54                   0.54                   0.54                  
 Execution time [s]:  7.95 ± 0.06 (100.00%)  4.05 ± 0.01 ( 50.90%)  4.42 ± 0.06 ( 55.60%)  4.88 ± 0.06 ( 61.44%)  4.89 ± 0.02 ( 61.44%)  4.07 ± 0.01 ( 51.16%) 
compress asl32.nii -5 (10 iterations)
  Compression ratio:  0.43                   0.43                   0.43                   0.43                   0.43                   0.43                  
 Execution time [s]:  6.63 ± 0.01 (100.00%)  3.33 ± 0.01 ( 50.29%)  4.63 ± 0.01 ( 69.88%)  5.27 ± 0.01 ( 79.55%)  5.36 ± 0.01 ( 80.83%)  3.34 ± 0.00 ( 50.40%) 
compress asl16.nii -9 (10 iterations)
  Compression ratio:  0.55                   0.54                   0.55                   0.55                   0.55                   0.54                  
 Execution time [s]: 42.90 ± 0.29 (100.00%) 10.42 ± 0.10 ( 24.29%) 38.75 ± 0.19 ( 90.34%) 36.94 ± 0.12 ( 86.11%) 37.24 ± 0.14 ( 86.81%) 10.32 ± 0.06 ( 24.06%) 
compress asl32.nii -9 (10 iterations)
  Compression ratio:  0.43                   0.43                   0.43                   0.43                   0.43                   0.43                  
 Execution time [s]: 11.22 ± 0.02 (100.00%)  6.51 ± 0.01 ( 57.97%)  7.55 ± 0.01 ( 67.28%)  7.74 ± 0.02 ( 68.96%)  7.79 ± 0.01 ( 69.37%)  6.54 ± 0.04 ( 58.31%)

@gdevenyi
Copy link
Contributor Author

  1. Still seems to be enabling it:
    Checking for PCLMUL support ... Yes

Has the same illegal instruction issue.

  1. I agree from a performance standpoint that cloudflare is winning, but from a maintenance/tooling/code quality perspective, zlib-ng has been carefully cleaning up a lot of old compatibility code, adding static code analysers and address sanitizers, it has activity and interest. The performance patches still lurking in cloudflare and intel will eventually make it into zlib-ng, for example avx2 is coming, which I don't see in cloudflare or intel: Add initial AVX2 support. zlib-ng/zlib-ng#502

@neurolabusc
Copy link

  1. I absolutely agree that the zlib-ng trajectory looks good. In my experience, where Cloudflare has a real advantage is in scalp-stripped images where there are long run lengths of the same value. You can see that in my asl32.nii example. I remain skeptical of the value of avx. For many simple tasks, modern computers are memory bandwidth constrained, and it is hard to show a benefit even for explicit AVX instructions like FMA relative to letting the compiler optimize things. Worse, due to energy demands, even a few AVX instructions can lead to frequency scaling. I am happy to be wrong about this. Regardless, I like zlib-ng's best-athlete approach, clever hardware detection, etc. Someone just needs to figure out why CloudFlare performs so well for repeated runs.

  2. @gdevenyi can you please make sure you have the latest version and provide the full output if it fails on your Westmere. I copied the zlib-ng solution and so it should give you the following diagnostics:

-- Performing Test COMPILER_HAS_M_PCLMUL
-- Performing Test COMPILER_HAS_M_PCLMUL - Success
-- compiler supports pclmul
-- Performing Test HAVE_PCLMULQDQ_INTRIN
-- Performing Test HAVE_PCLMULQDQ_INTRIN - Success

I would have thought Westmere will report failure to the last line.

@stale stale bot removed the status:Backlog Postponed without a fixed deadline label Jan 11, 2021
@dzenanz
Copy link
Member

dzenanz commented Jan 11, 2021

@gdevenyi do you have time and will to include zlib-ng into ITK?

@thewtex if done in time, should we include it into 5.2 final?

@gdevenyi
Copy link
Contributor Author

What's the timeframe for 5.2?

@dzenanz
Copy link
Member

dzenanz commented Jan 12, 2021

Weeks. Currently scheduled for Jan 29th.

@gdevenyi
Copy link
Contributor Author

I'm not sure I can make it, so others are welcome to try :)

image

@thewtex
Copy link
Member

thewtex commented Jan 13, 2021

@gdevenyi we are working on the first ITK 5.2 release candidate, but we will have multiple release candidates -- it is reasonable to integrate this for the 5.2 release.

@gdevenyi
Copy link
Contributor Author

I haven't started trying yet, but this will probably bug us out as well, zlib-ng/zlib-ng#827

@stale
Copy link

stale bot commented Jun 11, 2021

This issue has been automatically marked as stale because it has not had recent activity. Thank you for your contributions.

@stale stale bot added the status:Backlog Postponed without a fixed deadline label Jun 11, 2021
@thewtex
Copy link
Member

thewtex commented Jun 15, 2021

I haven't started trying yet, but this will probably bug us out as well, zlib-ng/zlib-ng#827

Perhaps this is addressed by:

Include zlib.h when zlib compatibility is on (see -DZLIB_COMPAT=ON or --zlib-compat)

? per zlib-ng/zlib-ng#736 (comment)

@stale stale bot removed the status:Backlog Postponed without a fixed deadline label Jun 15, 2021
@gdevenyi
Copy link
Contributor Author

Yes, should probably try to investigate this again since 2.0.4 has been released.

@gdevenyi
Copy link
Contributor Author

After digging into this today, the zlib module in ITK uses the git subtree feature to track changes in a such a way where it is beyond my ability to successfully switch to zlib-ng.

I always end up at:

error: Entry 'Modules/ThirdParty/ZLIB/src/itkzlib/.gitattributes' overlaps with 'Modules/ThirdParty/ZLIB/src/itkzlib/.gitattributes'.  Cannot bind.

I suspect this might be due to the current ITK zlib being a diverged fork of the official zlib while zlib-ng is a different diverged fork.

@dzenanz
Copy link
Member

dzenanz commented Jun 30, 2021

Perhaps we should create a new root commit for zlib-ng? @bradking is this the right approach? If so can you help Gabriel do it? If not, what do you suggest?

@bradking
Copy link
Member

create a new root commit for zlib-ng

Yes, if the project has a disjoint Git commit history from the original zlib.

@bradking
Copy link
Member

@gdevenyi please push your current changes to Modules/ThirdParty/ZLIB/UpdateFromUpstream.sh to a branch in your fork and let me know the name.

@gdevenyi
Copy link
Contributor Author

There's essentially nothing done, but here it is:
https://github.com/gdevenyi/ITK/tree/zlib-ng

@bradking
Copy link
Member

That's a good start:

  • Add a preceding commit that does nothing but git rm -rf Modules/ThirdParty/ZLIB/src/itkzlib to remove the old files before running the new import. That will avoid the "overlaps" problem.

  • Replace name="zlib" with name="zlib-ng"

  • After git_archive, add

      pushd "${extractdir}/${name}-reduced"
      echo "* -whitespace" > .gitattributes
      popd
    
  • Restore the paths='...' list to import only the needed subset of files. I sometimes compute this by cloning the upstream manually, using git rm locally, and trying to build it. Modify CMakeLists.txt if needed to minimize files. After importing, add an extra commit that modifies CMakeLists.txt in the same way (if necessary).

@gdevenyi
Copy link
Contributor Author

That's a good start:

  • Add a preceding commit that does nothing but git rm -rf Modules/ThirdParty/ZLIB/src/itkzlib to remove the old files before running the new import. That will avoid the "overlaps" problem.

I had tried that and it hadn't sorted the overlaps :)

  • Replace name="zlib" with name="zlib-ng"

Good catch 👍

  • After git_archive, add
      pushd "${extractdir}/${name}-reduced"
      echo "* -whitespace" > .gitattributes
      popd
    

This is probably what actually solves the overlaps problem :)

  • Restore the paths='...' list to import only the needed subset of files. I sometimes compute this by cloning the upstream manually, using git rm locally, and trying to build it. Modify CMakeLists.txt if needed to minimize files. After importing, add an extra commit that modifies CMakeLists.txt in the same way (if necessary).

What exactly is the purpose of purging all these extra files? I know zlib-ng has now arch-specific files and such that I can't directly test the need for building, but may mean ITK can't build this properly on other arches.

Thanks very much for your help!

@bradking
Copy link
Member

What exactly is the purpose of purging all these extra files?

We exclude those to minimize the size of the imported content. We aren't developing zlib-ng or building it for external distribution, so none of the associated files is needed. Tests and documentation are not needed. Packaging files are not needed. Etc.

@gdevenyi
Copy link
Contributor Author

An exclude option for the UpdateFromUpstream sounds pretty nice right now :)

@gdevenyi
Copy link
Contributor Author

With third party modules, do we typically build with tests?

@dzenanz
Copy link
Member

dzenanz commented Jun 30, 2021

We typically remove tests entirely when importing a third-party library (as Brad King suggested above). So we don't test third party libraries, we assume they are fine.

@dzenanz
Copy link
Member

dzenanz commented Oct 28, 2021

Done in #2626 #2708 #2803.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:ThirdParty Issues affecting the ThirdParty module
Projects
None yet
Development

No branches or pull requests

7 participants