Skip to content

Commit cce6624

Browse files
committed
Add support for IBM Z hardware-accelerated deflate
IBM Z mainframes starting from version z15 provide DFLTCC instruction, which implements deflate algorithm in hardware with estimated compression and decompression performance orders of magnitude faster than the current zlib and ratio comparable with that of level 1. This patch adds DFLTCC support to zlib. In order to enable it, the following build commands should be used: $ ./configure --dfltcc $ make When built like this, zlib would compress in hardware on level 1, and in software on all other levels. Decompression will always happen in hardware. In order to enable DFLTCC compression for levels 1-6 (i.e. to make it used by default) one could either configure with --dfltcc-level-mask=0x7e or set the environment variable DFLTCC_LEVEL_MASK to 0x7e at run time. Two DFLTCC compression calls produce the same results only when they both are made on machines of the same generation, and when the respective buffers have the same offset relative to the start of the page. Therefore care should be taken when using hardware compression when reproducible results are desired. One such use case - reproducible software builds - is handled explicitly: when SOURCE_DATE_EPOCH environment variable is set, the hardware compression is disabled. DFLTCC does not support every single zlib feature, in particular: * inflate(Z_BLOCK) and inflate(Z_TREES) * inflateMark() * inflatePrime() * inflateSyncPoint() When used, these functions will either switch to software, or, in case this is not possible, gracefully fail. This patch tries to add DFLTCC support in the least intrusive way. All SystemZ-specific code is placed into a separate file, but unfortunately there is still a noticeable amount of changes in the main zlib code. Below is the summary of these changes. DFLTCC takes as arguments a parameter block, an input buffer, an output buffer and a window. Since DFLTCC requires parameter block to be doubleword-aligned, and it's reasonable to allocate it alongside deflate and inflate states, ZALLOC_STATE, ZFREE_STATE and ZCOPY_STATE macros were introduced in order to encapsulate the allocation details. The same is true for window, for which ZALLOC_WINDOW and TRY_FREE_WINDOW macros were introduced. Software and hardware window formats do not match, therefore, deflateSetDictionary(), deflateGetDictionary(), inflateSetDictionary() and inflateGetDictionary() need special handling, which is triggered using DEFLATE_SET_DICTIONARY_HOOK, DEFLATE_GET_DICTIONARY_HOOK, INFLATE_SET_DICTIONARY_HOOK and INFLATE_GET_DICTIONARY_HOOK macros. deflateResetKeep() and inflateResetKeep() now update the DFLTCC parameter block, which is allocated alongside zlib state, using the new DEFLATE_RESET_KEEP_HOOK and INFLATE_RESET_KEEP_HOOK macros. The new DEFLATE_PARAMS_HOOK switches between hardware and software deflate implementations when deflateParams() arguments demand this. The new INFLATE_PRIME_HOOK, INFLATE_MARK_HOOK and INFLATE_SYNC_POINT_HOOK macros make the respective unsupported calls gracefully fail. The algorithm implemented in hardware has different compression ratio than the one implemented in software. In order for deflateBound() to return the correct results for the hardware implementation, the new DEFLATE_BOUND_ADJUST_COMPLEN and DEFLATE_NEED_CONSERVATIVE_BOUND macros were introduced. Actual compression and decompression are handled by the new DEFLATE_HOOK and INFLATE_TYPEDO_HOOK macros. Since inflation with DFLTCC manages the window on its own, calling updatewindow() is suppressed using the new INFLATE_NEED_UPDATEWINDOW() macro. In addition to compression, DFLTCC computes CRC-32 and Adler-32 checksums, therefore, whenever it's used, software checksumming needs to be suppressed using the new DEFLATE_NEED_CHECKSUM and INFLATE_NEED_CHECKSUM macros. DFLTCC will refuse to write an End-of-block Symbol if there is no input data, thus in some cases it is necessary to do this manually. In order to achieve this, send_bits, bi_reverse, bi_windup and flush_pending were promoted from local to ZLIB_INTERNAL. Furthermore, since block and stream termination must be handled in software as well, block_state enum was moved to deflate.h. Since the first call to dfltcc_inflate already needs the window, and it might be not allocated yet, inflate_ensure_window was factored out of updatewindow and made ZLIB_INTERNAL.
1 parent 8ef06a3 commit cce6624

17 files changed

+1469
-63
lines changed

Makefile.in

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,14 @@ match.lo: match.S
139139
mv _match.o match.lo
140140
rm -f _match.s
141141

142+
dfltcc.o: $(SRCDIR)contrib/s390/dfltcc.c $(SRCDIR)zlib.h zconf.h
143+
$(CC) $(CFLAGS) $(ZINC) -c -o $@ $(SRCDIR)contrib/s390/dfltcc.c
144+
145+
dfltcc.lo: $(SRCDIR)contrib/s390/dfltcc.c $(SRCDIR)zlib.h zconf.h
146+
-@mkdir objs 2>/dev/null || test -d objs
147+
$(CC) $(SFLAGS) $(ZINC) -DPIC -c -o objs/dfltcc.o $(SRCDIR)contrib/s390/dfltcc.c
148+
-@mv objs/dfltcc.o $@
149+
142150
crc32_test.o: $(SRCDIR)test/crc32_test.c $(SRCDIR)zlib.h zconf.h
143151
$(CC) $(CFLAGS) $(ZINCOUT) -c -o $@ $(SRCDIR)test/crc32_test.c
144152

compress.c

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,15 @@
55

66
/* @(#) $Id$ */
77

8-
#define ZLIB_INTERNAL
8+
#include "zutil.h"
99
#include "zlib.h"
1010

11+
#ifdef DFLTCC
12+
# include "contrib/s390/dfltcc.h"
13+
#else
14+
#define DEFLATE_BOUND_COMPLEN(source_len) 0
15+
#endif
16+
1117
/* ===========================================================================
1218
Compresses the source buffer into the destination buffer. The level
1319
parameter has the same meaning as in deflateInit. sourceLen is the byte
@@ -81,6 +87,12 @@ int ZEXPORT compress(dest, destLen, source, sourceLen)
8187
uLong ZEXPORT compressBound(sourceLen)
8288
uLong sourceLen;
8389
{
90+
uLong complen = DEFLATE_BOUND_COMPLEN(sourceLen);
91+
92+
if (complen > 0)
93+
/* Architecture-specific code provided an upper bound. */
94+
return complen + ZLIB_WRAPLEN;
95+
8496
return sourceLen + (sourceLen >> 12) + (sourceLen >> 14) +
8597
(sourceLen >> 25) + 13;
8698
}

configure

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,7 @@ case "$1" in
118118
echo ' configure [--const] [--zprefix] [--prefix=PREFIX] [--eprefix=EXPREFIX]' | tee -a configure.log
119119
echo ' [--static] [--64] [--libdir=LIBDIR] [--sharedlibdir=LIBDIR]' | tee -a configure.log
120120
echo ' [--includedir=INCLUDEDIR] [--archs="-arch i386 -arch x86_64"]' | tee -a configure.log
121+
echo ' [--dfltcc] [--dfltcc-level-mask=MASK]' | tee -a configure.log
121122
exit 0 ;;
122123
-p*=* | --prefix=*) prefix=`echo $1 | sed 's/.*=//'`; shift ;;
123124
-e*=* | --eprefix=*) exec_prefix=`echo $1 | sed 's/.*=//'`; shift ;;
@@ -142,6 +143,16 @@ case "$1" in
142143
-w* | --warn) warn=1; shift ;;
143144
-d* | --debug) debug=1; shift ;;
144145
--sanitize) sanitize=1; shift ;;
146+
--dfltcc)
147+
CFLAGS="$CFLAGS -DDFLTCC"
148+
OBJC="$OBJC dfltcc.o"
149+
PIC_OBJC="$PIC_OBJC dfltcc.lo"
150+
shift
151+
;;
152+
--dfltcc-level-mask=*)
153+
CFLAGS="$CFLAGS -DDFLTCC_LEVEL_MASK=`echo $1 | sed 's/.*=//'`"
154+
shift
155+
;;
145156
*)
146157
echo "unknown option: $1" | tee -a configure.log
147158
echo "$0 --help for help" | tee -a configure.log
@@ -828,6 +839,19 @@ EOF
828839
fi
829840
fi
830841

842+
# Check whether sys/sdt.h is available
843+
cat > $test.c << EOF
844+
#include <sys/sdt.h>
845+
int main() { return 0; }
846+
EOF
847+
if try $CC -c $CFLAGS $test.c; then
848+
echo "Checking for sys/sdt.h ... Yes." | tee -a configure.log
849+
CFLAGS="$CFLAGS -DHAVE_SYS_SDT_H"
850+
SFLAGS="$SFLAGS -DHAVE_SYS_SDT_H"
851+
else
852+
echo "Checking for sys/sdt.h ... No." | tee -a configure.log
853+
fi
854+
831855
# test to see if we can use a gnu indirection function to detect and load optimized code at runtime
832856
echo >> configure.log
833857
cat > $test.c <<EOF

contrib/README.contrib

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,10 @@ puff/ by Mark Adler <madler@alumni.caltech.edu>
5555
Small, low memory usage inflate. Also serves to provide an
5656
unambiguous description of the deflate format.
5757

58+
s390/ by Ilya Leoshkevich <iii@linux.ibm.com>
59+
Hardware-accelerated deflate on IBM Z with DEFLATE CONVERSION CALL
60+
instruction.
61+
5862
testzlib/ by Gilles Vollant <info@winimage.com>
5963
Example of the use of zlib
6064

contrib/s390/README.txt

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
IBM Z mainframes starting from version z15 provide DFLTCC instruction,
2+
which implements deflate algorithm in hardware with estimated
3+
compression and decompression performance orders of magnitude faster
4+
than the current zlib and ratio comparable with that of level 1.
5+
6+
This directory adds DFLTCC support. In order to enable it, the following
7+
build commands should be used:
8+
9+
$ ./configure --dfltcc
10+
$ make
11+
12+
When built like this, zlib would compress in hardware on level 1, and in
13+
software on all other levels. Decompression will always happen in
14+
hardware. In order to enable DFLTCC compression for levels 1-6 (i.e. to
15+
make it used by default) one could either configure with
16+
--dfltcc-level-mask=0x7e or set the environment variable
17+
DFLTCC_LEVEL_MASK to 0x7e at run time.

0 commit comments

Comments
 (0)