Skip to content

HADOOP-13126 Add BrotliCodec based on Brotli4j library #2723

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: trunk
Choose a base branch
from

Conversation

martin-g
Copy link
Member

Adds BrotliCodec - a compression codec based on Google Brotli

This PR is a continuation on the work done by @rdblue at https://issues.apache.org/jira/browse/HADOOP-13126
In his patches it was based on jbrotli library but this library is not maintained since few years. My PR uses Brotli4j

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 3s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 12m 33s Maven dependency ordering for branch
+1 💚 mvninstall 24m 28s trunk passed
+1 💚 compile 27m 12s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 22m 26s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 checkstyle 3m 59s trunk passed
+1 💚 mvnsite 2m 9s trunk passed
+1 💚 javadoc 1m 37s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 2m 9s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+0 🆗 spotbugs 0m 33s branch/hadoop-project no spotbugs output file (spotbugsXml.xml)
+1 💚 shadedclient 18m 58s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 38s Maven dependency ordering for patch
+1 💚 mvninstall 1m 19s the patch passed
+1 💚 compile 26m 32s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
-1 ❌ javac 26m 32s /results-compile-javac-root-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt root-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 generated 4 new + 1918 unchanged - 0 fixed = 1922 total (was 1918)
+1 💚 compile 22m 36s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 javac 22m 36s the patch passed
+1 💚 blanks 0m 1s The patch has no blanks issues.
-0 ⚠️ checkstyle 3m 52s /results-checkstyle-root.txt root: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 💚 mvnsite 2m 3s the patch passed
+1 💚 xml 0m 3s The patch has no ill-formed XML file.
+1 💚 javadoc 1m 33s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 2m 9s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+0 🆗 spotbugs 0m 29s hadoop-project has no data from spotbugs
+1 💚 shadedclient 18m 42s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 0m 28s hadoop-project in the patch passed.
+1 💚 unit 17m 56s hadoop-common in the patch passed.
+1 💚 asflicense 0m 52s The patch does not generate ASF License warnings.
221m 29s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2723/1/artifact/out/Dockerfile
GITHUB PR #2723
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell xml spotbugs checkstyle
uname Linux b4b616a90fa4 4.15.0-128-generic #131-Ubuntu SMP Wed Dec 9 06:57:35 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 93bcb41ef3f0f4a6fae42143d169a532977f15ba
Default Java Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2723/1/testReport/
Max. process+thread count 2902 (vs. ulimit of 5500)
modules C: hadoop-project hadoop-common-project/hadoop-common U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2723/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@martin-g martin-g force-pushed the HADOOP-13126-add-BrotliCodec-based-on-Brotli4j branch from 93bcb41 to 47f0593 Compare June 23, 2022 12:15
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 38s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 14m 47s Maven dependency ordering for branch
+1 💚 mvninstall 25m 0s trunk passed
+1 💚 compile 23m 12s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 compile 20m 33s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 4m 27s trunk passed
+1 💚 mvnsite 3m 26s trunk passed
+1 💚 javadoc 3m 1s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 2m 33s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+0 🆗 spotbugs 1m 26s branch/hadoop-project no spotbugs output file (spotbugsXml.xml)
+1 💚 shadedclient 23m 24s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 44s Maven dependency ordering for patch
-1 ❌ mvninstall 0m 28s /patch-mvninstall-hadoop-common-project_hadoop-common.txt hadoop-common in the patch failed.
-1 ❌ compile 0m 53s /patch-compile-root-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt root in the patch failed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.
-1 ❌ javac 0m 53s /patch-compile-root-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt root in the patch failed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.
-1 ❌ compile 0m 47s /patch-compile-root-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.txt root in the patch failed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.
-1 ❌ javac 0m 47s /patch-compile-root-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.txt root in the patch failed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 4m 26s /results-checkstyle-root.txt root: The patch generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0)
-1 ❌ mvnsite 0m 32s /patch-mvnsite-hadoop-common-project_hadoop-common.txt hadoop-common in the patch failed.
-1 ❌ javadoc 0m 29s /patch-javadoc-hadoop-common-project_hadoop-common-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt hadoop-common in the patch failed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.
-1 ❌ javadoc 0m 28s /patch-javadoc-hadoop-common-project_hadoop-common-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.txt hadoop-common in the patch failed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.
+0 🆗 spotbugs 0m 23s hadoop-project has no data from spotbugs
-1 ❌ spotbugs 0m 30s /patch-spotbugs-hadoop-common-project_hadoop-common.txt hadoop-common in the patch failed.
-1 ❌ shadedclient 1m 45s patch has errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 0m 20s hadoop-project in the patch passed.
-1 ❌ unit 0m 27s /patch-unit-hadoop-common-project_hadoop-common.txt hadoop-common in the patch failed.
+1 💚 asflicense 0m 36s The patch does not generate ASF License warnings.
141m 37s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2723/2/artifact/out/Dockerfile
GITHUB PR #2723
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle
uname Linux 19f057b42df7 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 47f0593
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2723/2/testReport/
Max. process+thread count 761 (vs. ulimit of 5500)
modules C: hadoop-project hadoop-common-project/hadoop-common U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2723/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 37s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 xmllint 0m 1s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 14m 55s Maven dependency ordering for branch
+1 💚 mvninstall 25m 28s trunk passed
+1 💚 compile 23m 56s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 compile 20m 37s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 4m 28s trunk passed
+1 💚 mvnsite 3m 26s trunk passed
+1 💚 javadoc 2m 49s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 2m 35s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+0 🆗 spotbugs 1m 26s branch/hadoop-project no spotbugs output file (spotbugsXml.xml)
+1 💚 shadedclient 23m 36s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 39s Maven dependency ordering for patch
+1 💚 mvninstall 1m 27s the patch passed
+1 💚 compile 22m 4s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
-1 ❌ javac 22m 4s /results-compile-javac-root-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt root-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 generated 4 new + 2879 unchanged - 0 fixed = 2883 total (was 2879)
+1 💚 compile 20m 35s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 20m 35s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 4m 19s /results-checkstyle-root.txt root: The patch generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0)
+1 💚 mvnsite 3m 28s the patch passed
+1 💚 javadoc 2m 55s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 2m 35s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+0 🆗 spotbugs 1m 13s hadoop-project has no data from spotbugs
+1 💚 shadedclient 23m 42s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 1m 7s hadoop-project in the patch passed.
+1 💚 unit 18m 27s hadoop-common in the patch passed.
+1 💚 asflicense 1m 36s The patch does not generate ASF License warnings.
236m 57s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2723/3/artifact/out/Dockerfile
GITHUB PR #2723
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle
uname Linux b9eb43b210c0 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / f48cf20
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2723/3/testReport/
Max. process+thread count 3152 (vs. ulimit of 5500)
modules C: hadoop-project hadoop-common-project/hadoop-common U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2723/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@martin-g
Copy link
Member Author

What is the policy about using Object#finalize() in the compress package ?
The last check fails because of 4 new usages of finalize in the Brotli[De]Compressor` classes.

grep -rnHi finalize ./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/
./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/brotli/BrotliCompressor.java:251:  protected void finalize() throws Throwable {
./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/brotli/BrotliCompressor.java:252:    super.finalize();
./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/brotli/BrotliDecompressor.java:173:  protected void finalize() throws Throwable {
./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/brotli/BrotliDecompressor.java:174:    super.finalize();
./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/bzip2/CBZip2OutputStream.java:711:  protected void finalize() throws Throwable {
./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/bzip2/CBZip2OutputStream.java:713:    super.finalize();
./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/ZlibDecompressor.java:294:  protected void finalize() {
./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zstd/ZStandardDecompressor.java:249:  protected void finalize() {

@ibobak
Copy link

ibobak commented Oct 28, 2022

Colleagues,

I've taken the source code from this commit 47f0593

made a jar of it, plugged it into my Spark cluster, launched a huge job with many transformations and actions, and found that there is a serious memory leak: executors consume RAM more and more (no matter that there is a limitation of 20GB, they consumed 40GB).

I've made my own version of Brotli codec (also based on brotli4j) by looking at how Snappy and others are made, and it works with no memory leaks. Soon I'll post my PR.

@martin-g
Copy link
Member Author

@ibobak If the change is small you can also tell me what to change and I can update this PR.
But it seems there is no much interest in having BrotliCodec. This PR is opened for almost 2 years ...

@ibobak
Copy link

ibobak commented Oct 31, 2022

Update is big. I am now testing my version of the codec in my organzation, until I am sure that it works fine and without memory leaks, I won't post a PR. I need a little bit more time.

@hyperxpro
Copy link

What is pending more in this PR? I can try to help and get this merged.

Brotli4j maintainer here ^_^

@martin-g
Copy link
Member Author

@hyperxpro feel free to edit this PR or open a new one based on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants