KAFKA-15057: Use new interface from zstd-jni #13814

divijvaidya · 2023-06-05T17:05:34Z

Background

In Kafka's code, every batch of records is stored in an in-memory byte buffer. For compressed workload, this buffer contains data in compressed form. Before writing it to the log, Kafka performs some validations such as ensuring that offsets are monotonically increasing etc. To perform this validation, Kafka needs to uncompress the data stored in byte buffer.

For zstd compressed batches, Kafka uses ZstdInputStreamNoFinalizer interface provided by the downstream zstd-jni library to perform decompression.

ZstdInputStreamNoFinalizer takes input an InputStream and provides output an InputStream. Since, Kafka stores the entire batch in a ByteBuffer, Kafka wraps the ByteBuffer into an InputStream to satisfy the input contract for ZstdInputStreamNoFinalizer.

Problem

ZstdInputStreamNoFinalizer is not a good fit for our use case because we already have the entire compressed data stored in a buffer. We don't have a need for an interface which takes InputStream as an input. Our requirement is for an interface which takes a ByteBuffer as an input and provides a stream of uncompressed data as output. Prior to zstd-jni 1.5.5, no such interface existed. Hence, we were forced to use ZstdInputStreamNoFinalizer.

Usage of ZstdInputStreamNoFinalizer has the following problems:

When decompression of batch is complete, we try to read another byte to check if the actual batch size if equal to declared batch size. This is done at RecordIterator#next(). This extra call to read another byte leads to a JNI call in existing interface.
Since this interface requires input as an InputStream, we take the ByteBuffer containing compressed batch and convert it into a InputStream. This interface internally uses an intermediate buffer to read data from this InputStream in chunks. The chunk size is determined by underlying zstd library and hence, we will allocate a new buffer with very batch. This leads to the following transformation: ByteBuffer (compressed batch) -> InputStream (compressed batch) -> data copy to intermediate ByteBuffer (chunk of compressed batch) -> send chunk to zstd library for decompression -> refill the intermediate buffer by copying the data to intermediate ByteBuffer (next chunk of compressed batch)

Solution

I have extended an an interface in downstream library zstd-jni to suit the use case of Kafka. The new interface is called ZstdBufferDecompressingStreamNoFinalizer. It provides an interface where it takes input as a ByteBuffer containing compressed data and provides output as an InputStream. It solves the above problems as follows:

When we read the final decompressed frame, this interface sets a flag to mark that all uncompressed data has been consumed. When RecordIterator#next() tries to determine if the stream has ended, we simply read the flag and hence, do not have to make a JNI call.
It does not require any buffer allocation for input. It takes the input buffer and passes it across the JNI boundary without any intermediate copying. Hence, we don't perform any buffer allocation.

Result

Improvement in method throughput 10-20% as demonstrated by microbenchmark report at https://issues.apache.org/jira/secure/attachment/13058907/zstd-upgrade.xlsx . The microbenchmark uses existing benchmark at https://github.com/apache/kafka/blob/trunk/jmh-benchmarks/src/main/java/org/apache/kafka/jmh/record/RecordBatchIterationBenchmark.java
Reducing in allocation of a buffer as demonstrated by unit test

References

Changes in downstream zstd-jni

Add new interface -
luben/zstd-jni@d65490e

Bug fixes in new interface -
luben/zstd-jni@8bf8066438785ce55b62fc7e6816faafe1e3b39e
luben/zstd-jni@100c434
luben/zstd-jni@355b8511a2967d097a619047a579930cac2ccd9d

soarez

Nice one!

clients/src/main/java/org/apache/kafka/common/compress/ZstdFactory.java

soarez · 2023-06-07T12:58:55Z

clients/src/main/java/org/apache/kafka/common/record/DefaultRecordBatch.java

@@ -269,7 +269,7 @@ public int partitionLeaderEpoch() {

    public InputStream recordInputStream(BufferSupplier bufferSupplier) {
        final ByteBuffer buffer = this.buffer.duplicate();
-        buffer.position(RECORDS_OFFSET);
+        buffer.position(buffer.position() + RECORDS_OFFSET);


Is this change related? Perhaps a comment on why this is changing?

not a relates change but something I found while lurking in the code. I would probably revert it from this pr so that we don't pollute this with unnecessary changes.

soarez

LGTM

dajac · 2023-06-10T08:53:11Z

@divijvaidya Nice one! Out of curiosity, have you tried to run kafka-producer-perf-test.sh before/after the patch?

github-actions · 2023-09-09T03:33:27Z

This PR is being marked as stale since it has not had any activity in 90 days. If you would like to keep this PR alive, please ask a committer for review. If the PR has merge conflicts, please update it with the latest from trunk (or appropriate release branch)

If this PR is no longer valid or desired, please feel free to close it. If no activity occurs in the next 30 days, it will be automatically closed.

mimaison · 2023-12-19T15:20:26Z

It would be good to get this merged.
@divijvaidya Can you shared the code for the micro benchmark you mention? Do you see an impact when running the kafka-producer-perf-test/kafka-consumer-perf-test tools?

divijvaidya added core Kafka Broker performance labels Jun 5, 2023

soarez reviewed Jun 7, 2023

View reviewed changes

soarez approved these changes Jun 7, 2023

View reviewed changes

divijvaidya requested review from showuon and dajac June 9, 2023 11:32

github-actions bot added the stale Stale PRs label Sep 9, 2023

divijvaidya mentioned this pull request Sep 26, 2023

MINOR: Upgrade version of zstd-jni to the latest stable version 1.5.5-6 #14449

Merged

3 tasks

divijvaidya added 4 commits November 19, 2023 12:26

Upgrade zstd-lib to 1.5.5-9

086baf2

use new interface

fd28246

bump version

541cb58

Add java doc

b53bb11

divijvaidya force-pushed the zstd-interface-change branch from 87492c1 to b53bb11 Compare November 19, 2023 10:07

github-actions bot removed the stale Stale PRs label Nov 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-15057: Use new interface from zstd-jni #13814

KAFKA-15057: Use new interface from zstd-jni #13814

divijvaidya commented Jun 5, 2023 •

edited

Loading

soarez left a comment

soarez Jun 7, 2023

divijvaidya Jun 7, 2023

soarez left a comment

dajac commented Jun 10, 2023

github-actions bot commented Sep 9, 2023

mimaison commented Dec 19, 2023

KAFKA-15057: Use new interface from zstd-jni #13814

Are you sure you want to change the base?

KAFKA-15057: Use new interface from zstd-jni #13814

Conversation

divijvaidya commented Jun 5, 2023 • edited Loading

Background

Problem

Solution

Result

References

Changes in downstream zstd-jni

soarez left a comment

Choose a reason for hiding this comment

soarez Jun 7, 2023

Choose a reason for hiding this comment

divijvaidya Jun 7, 2023

Choose a reason for hiding this comment

soarez left a comment

Choose a reason for hiding this comment

dajac commented Jun 10, 2023

github-actions bot commented Sep 9, 2023

mimaison commented Dec 19, 2023

divijvaidya commented Jun 5, 2023 •

edited

Loading