-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a larger buffer size for java.util.zip.*Stream
classes
#20316
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
121935f
to
2327bde
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!! Just to clarify, we are already using JDK 21 as the embedded JDK, do we still need this change? Or is this mostly aiming to improve performance when running Bazel without the embedded JDK?
@bazel-io flag |
As stated in the PR description, there is an open issue to standardise all buffers on 8192, as those in java.util.zip are smaller for historic reasons: https://bugs.openjdk.org/browse/JDK-8242864 If you’re looking at the latest JDK22 (+25) tag, which is further along than JDK21, it’s still set to use 512 as the default: https://github.com/openjdk/jdk/blob/jdk-22%2B25/src/java.base/share/classes/java/util/zip/GZIPInputStream.java#L91 |
@bazel-io fork 7.0.0 |
What does |
Hi @EdSchouten, Could you respond for the above comment. This PR is awaiting to merge . |
Hey there, Yes, Kushal is a colleague of mine. I'm still waiting for him to be added to the right Google Group for the CLA approval to go through. I'll keep you posted when that's done. |
`DeflaterInputStream`, `GZIPInputStream`, `GZIPOutputStream`, and `InflaterInputStream`, all use an internal byte buffer of 512 bytes by default. Whenever the wrapped stream exceeds this size, a full copy to a new buffer will occur, which will increase at increments of the same size. For example, a stream of length 2K will be copied four times. Increasing the size of the buffer we use can result in significant reductions in CPU usage (read: copies). Examples in the repository -------------------------- There are already two places where we increase the default size of these buffers: - `//src/main/java/com/google/devtools/build/lib/bazel/repository/TarGzFunction.java` - `//src/main/java/com/google/devtools/build/lib/bazel/repository/downloader/HttpStream.java` Prior art --------- There is an open enhancement issue in the OpenJDK tracker on this which contains a benchmark for `InflaterOutputStream`: > Increase the default, internal buffer size of the Streams in `java.util.zip` > https://bugs.openjdk.org/browse/JDK-8242864 A similar change was merged in for JDK15+ in 2020: > Improve performance of `InflaterOutputStream.write()` > https://bugs.openjdk.org/browse/JDK-8242848 Providing a simple benchmark ---------------------------- I'm inlining a simple `jmh` benchmark and the results underneath it for one `GzipInputStream` case. The benchmark: ``` @fork(1) @threads(1) @WarmUp(iterations = 2) @State(Scope.Benchmark) @OutputTimeUnit(TimeUnit.NANOSECONDS) public class GZIPInputStreamBenchmark { @param({"1024", "3072", "9216"}) long inputLength; @param({"512", "1024", "4096", "8192"}) int bufferSize; private byte[] content; @setup(Level.Iteration) public void setup() throws IOException { var baos = new ByteArrayOutputStream(); // No need to set the buffer size on this as it's a one-time cost for setup and not counted in the result. var gzip = new GZIPOutputStream(baos); var inputBytes = generateRandomByteArrayOfLength(inputLength); gzip.write(inputBytes); gzip.finish(); this.content = baos.toByteArray(); } @benchmark @BenchmarkMode(Mode.AverageTime) public void getGzipInputStream(Blackhole bh) throws IOException { try (var is = new ByteArrayInputStream(this.content); var gzip = new GZIPInputStream(is, bufferSize)) { bh.consume(gzip.readAllBytes()); } } byte[] generateRandomByteArrayOfLength(long length) { var random = new Random(); var intStream = random.ints(0, 5000).limit(length).boxed(); return intStream.collect( ByteArrayOutputStream::new, (baos, i) -> baos.write(i.intValue()), (baos1, baos2) -> baos1.write(baos2.toByteArray(), 0, baos2.size()) ).toByteArray(); } } ``` The results: ``` Benchmark (bufferSize) (inputLength) Mode Cnt Score Error Units GZIPInputStreamBenchmark.getGzipInputStream 512 1024 avgt 5 3207.217 ± 24.919 ns/op GZIPInputStreamBenchmark.getGzipInputStream 512 3072 avgt 5 5874.191 ± 5.827 ns/op GZIPInputStreamBenchmark.getGzipInputStream 512 9216 avgt 5 15567.345 ± 93.281 ns/op GZIPInputStreamBenchmark.getGzipInputStream 1024 1024 avgt 5 2580.566 ± 14.566 ns/op GZIPInputStreamBenchmark.getGzipInputStream 1024 3072 avgt 5 4154.582 ± 16.016 ns/op GZIPInputStreamBenchmark.getGzipInputStream 1024 9216 avgt 5 9942.521 ± 61.215 ns/op GZIPInputStreamBenchmark.getGzipInputStream 4096 1024 avgt 5 2150.255 ± 52.770 ns/op GZIPInputStreamBenchmark.getGzipInputStream 4096 3072 avgt 5 2289.185 ± 71.396 ns/op GZIPInputStreamBenchmark.getGzipInputStream 4096 9216 avgt 5 5656.891 ± 28.499 ns/op GZIPInputStreamBenchmark.getGzipInputStream 8192 1024 avgt 5 2177.427 ± 30.896 ns/op GZIPInputStreamBenchmark.getGzipInputStream 8192 3072 avgt 5 2517.390 ± 21.296 ns/op GZIPInputStreamBenchmark.getGzipInputStream 8192 9216 avgt 5 5227.932 ± 55.525 ns/op ``` Co-authored-by: Kushal Pisavadia <kushal.p@apple.com>
2327bde
to
258ca75
Compare
CLA has been sorted out. I've added
to the commit message, and the CLA bot picked it up. |
Thanks @EdSchouten and @KushalP! |
`DeflaterInputStream`, `GZIPInputStream`, `GZIPOutputStream`, and `InflaterInputStream`, all use an internal byte buffer of 512 bytes by default. Whenever the wrapped stream exceeds this size, a full copy to a new buffer will occur, which will increase at increments of the same size. For example, a stream of length 2K will be copied four times. Increasing the size of the buffer we use can result in significant reductions in CPU usage (read: copies). Examples in the repository -------------------------- There are already two places where we increase the default size of these buffers: - `//src/main/java/com/google/devtools/build/lib/bazel/repository/TarGzFunction.java` - `//src/main/java/com/google/devtools/build/lib/bazel/repository/downloader/HttpStream.java` Prior art --------- There is an open enhancement issue in the OpenJDK tracker on this which contains a benchmark for `InflaterOutputStream`: > Increase the default, internal buffer size of the Streams in `java.util.zip` > https://bugs.openjdk.org/browse/JDK-8242864 A similar change was merged in for JDK15+ in 2020: > Improve performance of `InflaterOutputStream.write()` > https://bugs.openjdk.org/browse/JDK-8242848 Providing a simple benchmark ---------------------------- I'm inlining a simple `jmh` benchmark and the results underneath it for one `GzipInputStream` case. The benchmark: ``` @fork(1) @threads(1) @WarmUp(iterations = 2) @State(Scope.Benchmark) @OutputTimeUnit(TimeUnit.NANOSECONDS) public class GZIPInputStreamBenchmark { @param({"1024", "3072", "9216"}) long inputLength; @param({"512", "1024", "4096", "8192"}) int bufferSize; private byte[] content; @setup(Level.Iteration) public void setup() throws IOException { var baos = new ByteArrayOutputStream(); // No need to set the buffer size on this as it's a one-time cost for setup and not counted in the result. var gzip = new GZIPOutputStream(baos); var inputBytes = generateRandomByteArrayOfLength(inputLength); gzip.write(inputBytes); gzip.finish(); this.content = baos.toByteArray(); } @benchmark @BenchmarkMode(Mode.AverageTime) public void getGzipInputStream(Blackhole bh) throws IOException { try (var is = new ByteArrayInputStream(this.content); var gzip = new GZIPInputStream(is, bufferSize)) { bh.consume(gzip.readAllBytes()); } } byte[] generateRandomByteArrayOfLength(long length) { var random = new Random(); var intStream = random.ints(0, 5000).limit(length).boxed(); return intStream.collect( ByteArrayOutputStream::new, (baos, i) -> baos.write(i.intValue()), (baos1, baos2) -> baos1.write(baos2.toByteArray(), 0, baos2.size()) ).toByteArray(); } } ``` The results: ``` Benchmark (bufferSize) (inputLength) Mode Cnt Score Error Units GZIPInputStreamBenchmark.getGzipInputStream 512 1024 avgt 5 3207.217 ± 24.919 ns/op GZIPInputStreamBenchmark.getGzipInputStream 512 3072 avgt 5 5874.191 ± 5.827 ns/op GZIPInputStreamBenchmark.getGzipInputStream 512 9216 avgt 5 15567.345 ± 93.281 ns/op GZIPInputStreamBenchmark.getGzipInputStream 1024 1024 avgt 5 2580.566 ± 14.566 ns/op GZIPInputStreamBenchmark.getGzipInputStream 1024 3072 avgt 5 4154.582 ± 16.016 ns/op GZIPInputStreamBenchmark.getGzipInputStream 1024 9216 avgt 5 9942.521 ± 61.215 ns/op GZIPInputStreamBenchmark.getGzipInputStream 4096 1024 avgt 5 2150.255 ± 52.770 ns/op GZIPInputStreamBenchmark.getGzipInputStream 4096 3072 avgt 5 2289.185 ± 71.396 ns/op GZIPInputStreamBenchmark.getGzipInputStream 4096 9216 avgt 5 5656.891 ± 28.499 ns/op GZIPInputStreamBenchmark.getGzipInputStream 8192 1024 avgt 5 2177.427 ± 30.896 ns/op GZIPInputStreamBenchmark.getGzipInputStream 8192 3072 avgt 5 2517.390 ± 21.296 ns/op GZIPInputStreamBenchmark.getGzipInputStream 8192 9216 avgt 5 5227.932 ± 55.525 ns/op ``` Co-authored-by: Kushal Pisavadia <kushal.p@apple.com> Closes bazelbuild#20316. PiperOrigin-RevId: 588444920 Change-Id: I1fb47f0b08dcb8d72f3e2c43534c33d60efb87f2
…20642) `DeflaterInputStream`, `GZIPInputStream`, `GZIPOutputStream`, and `InflaterInputStream`, all use an internal byte buffer of 512 bytes by default. Whenever the wrapped stream exceeds this size, a full copy to a new buffer will occur, which will increase at increments of the same size. For example, a stream of length 2K will be copied four times. Increasing the size of the buffer we use can result in significant reductions in CPU usage (read: copies). Examples in the repository -------------------------- There are already two places where we increase the default size of these buffers: - `//src/main/java/com/google/devtools/build/lib/bazel/repository/TarGzFunction.java` - `//src/main/java/com/google/devtools/build/lib/bazel/repository/downloader/HttpStream.java` Prior art --------- There is an open enhancement issue in the OpenJDK tracker on this which contains a benchmark for `InflaterOutputStream`: > Increase the default, internal buffer size of the Streams in `java.util.zip` > https://bugs.openjdk.org/browse/JDK-8242864 A similar change was merged in for JDK15+ in 2020: > Improve performance of `InflaterOutputStream.write()` > https://bugs.openjdk.org/browse/JDK-8242848 Providing a simple benchmark ---------------------------- I'm inlining a simple `jmh` benchmark and the results underneath it for one `GzipInputStream` case. The benchmark: ``` @fork(1) @threads(1) @WarmUp(iterations = 2) @State(Scope.Benchmark) @OutputTimeUnit(TimeUnit.NANOSECONDS) public class GZIPInputStreamBenchmark { @param({"1024", "3072", "9216"}) long inputLength; @param({"512", "1024", "4096", "8192"}) int bufferSize; private byte[] content; @setup(Level.Iteration) public void setup() throws IOException { var baos = new ByteArrayOutputStream(); // No need to set the buffer size on this as it's a one-time cost for setup and not counted in the result. var gzip = new GZIPOutputStream(baos); var inputBytes = generateRandomByteArrayOfLength(inputLength); gzip.write(inputBytes); gzip.finish(); this.content = baos.toByteArray(); } @benchmark @BenchmarkMode(Mode.AverageTime) public void getGzipInputStream(Blackhole bh) throws IOException { try (var is = new ByteArrayInputStream(this.content); var gzip = new GZIPInputStream(is, bufferSize)) { bh.consume(gzip.readAllBytes()); } } byte[] generateRandomByteArrayOfLength(long length) { var random = new Random(); var intStream = random.ints(0, 5000).limit(length).boxed(); return intStream.collect( ByteArrayOutputStream::new, (baos, i) -> baos.write(i.intValue()), (baos1, baos2) -> baos1.write(baos2.toByteArray(), 0, baos2.size()) ).toByteArray(); } } ``` The results: ``` Benchmark (bufferSize) (inputLength) Mode Cnt Score Error Units GZIPInputStreamBenchmark.getGzipInputStream 512 1024 avgt 5 3207.217 ± 24.919 ns/op GZIPInputStreamBenchmark.getGzipInputStream 512 3072 avgt 5 5874.191 ± 5.827 ns/op GZIPInputStreamBenchmark.getGzipInputStream 512 9216 avgt 5 15567.345 ± 93.281 ns/op GZIPInputStreamBenchmark.getGzipInputStream 1024 1024 avgt 5 2580.566 ± 14.566 ns/op GZIPInputStreamBenchmark.getGzipInputStream 1024 3072 avgt 5 4154.582 ± 16.016 ns/op GZIPInputStreamBenchmark.getGzipInputStream 1024 9216 avgt 5 9942.521 ± 61.215 ns/op GZIPInputStreamBenchmark.getGzipInputStream 4096 1024 avgt 5 2150.255 ± 52.770 ns/op GZIPInputStreamBenchmark.getGzipInputStream 4096 3072 avgt 5 2289.185 ± 71.396 ns/op GZIPInputStreamBenchmark.getGzipInputStream 4096 9216 avgt 5 5656.891 ± 28.499 ns/op GZIPInputStreamBenchmark.getGzipInputStream 8192 1024 avgt 5 2177.427 ± 30.896 ns/op GZIPInputStreamBenchmark.getGzipInputStream 8192 3072 avgt 5 2517.390 ± 21.296 ns/op GZIPInputStreamBenchmark.getGzipInputStream 8192 9216 avgt 5 5227.932 ± 55.525 ns/op ``` Co-authored-by: Kushal Pisavadia <kushal.p@apple.com> Closes #20316. Commit 75a6693 PiperOrigin-RevId: 588444920 Change-Id: I1fb47f0b08dcb8d72f3e2c43534c33d60efb87f2 Co-authored-by: Ed Schouten <eschouten@apple.com>
The changes in this PR have been included in Bazel 7.1.0 RC1. Please test out the release candidate and report any issues as soon as possible. If you're using Bazelisk, you can point to the latest RC by setting USE_BAZEL_VERSION=last_rc. |
DeflaterInputStream
,GZIPInputStream
,GZIPOutputStream
, andInflaterInputStream
, all use an internal byte buffer of 512 bytes by default.Whenever the wrapped stream exceeds this size, a full copy to a new buffer will occur, which will increase at increments of the same size. For example, a stream of length 2K will be copied four times. Increasing the size of the buffer we use can result in significant reductions in CPU usage (read: copies).
Examples in the repository
There are already two places where we increase the default size of these buffers:
//src/main/java/com/google/devtools/build/lib/bazel/repository/TarGzFunction.java
//src/main/java/com/google/devtools/build/lib/bazel/repository/downloader/HttpStream.java
Prior art
There is an open enhancement issue in the OpenJDK tracker on this which contains a benchmark for
InflaterOutputStream
:A similar change was merged in for JDK15+ in 2020:
Providing a simple benchmark
I'm inlining a simple
jmh
benchmark and the results underneath it for oneGzipInputStream
case.The benchmark:
The results:
Patch by: Kushal Pisavadia