Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP. HADOOP-17124. Support LZO Codec using aircompressor #3612

Draft
wants to merge 5 commits into
base: trunk
Choose a base branch
from

Conversation

viirya
Copy link
Member

@viirya viirya commented Nov 3, 2021

Description of PR

This patch adds LZO Codec from aircompressor which includes LzoCompressor and LzoDecompressor (WIP).

See https://issues.apache.org/jira/browse/HADOOP-17124 for details.

The mostly famous usage of aircompressor is trino. Trino uses the library for its Lz4Codec, LzoCodec, SnappyCodec, etc. The code link is:

https://github.com/trinodb/trino/blob/fe608f2723842037ff620d612a706900e79c52c8/lib/trino-rcfile/src/main/java/io/trino/rcfile/AircompressorCodecFactory.java

How was this patch tested?

Unit test.

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@viirya viirya marked this pull request as draft November 3, 2021 02:32
@viirya
Copy link
Member Author

viirya commented Nov 3, 2021

cc @sunchao @dbtsai

@viirya
Copy link
Member Author

viirya commented Nov 3, 2021

LzoCompressor is ready for review. I will work on LzoDecompressor next.

Comment on lines 384 to 388
<groupId>com.hadoop.gplcompression</groupId>
<artifactId>hadoop-lzo</artifactId>
<version>0.4.21-SNAPSHOT</version>
<scope>test</scope>
</dependency>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put it here for verifying the new Lzo codec only. We will remove it before merging.

/**
* This class creates lzo compressors/decompressors.
*/
public class LzoCodec2 implements Configurable, CompressionCodec {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will rename to LzoCodec before merging.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 0s Docker mode activated.
-1 ❌ patch 0m 18s #3612 does not apply to trunk. Rebase required? Wrong Branch? See https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute for help.
Subsystem Report/Notes
GITHUB PR #3612
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3612/1/console
versions git=2.17.1
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

Comment on lines +2466 to +2471
<repository>
<id>twitter</id>
<url>https://maven.twttr.com/</url>
</repository>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will remove this before merging.

@viirya
Copy link
Member Author

viirya commented Nov 3, 2021

Comparing with #2159, this adds LzoCompressor and LzoDecompressor (WIP) supports. I don't put this bridging classes with same name to com.hadoop.compression.lzo.LzoCodec etc. User might be unaware of the implicit change of codec, and I also think some users may still want to stick with original codec implementation.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 0s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 12m 59s Maven dependency ordering for branch
+1 💚 mvninstall 25m 3s trunk passed
+1 💚 compile 24m 12s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 20m 7s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 checkstyle 3m 55s trunk passed
+1 💚 mvnsite 2m 4s trunk passed
+1 💚 javadoc 1m 31s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 2m 2s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+0 🆗 spotbugs 0m 32s branch/hadoop-project no spotbugs output file (spotbugsXml.xml)
+1 💚 shadedclient 24m 19s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 19s Maven dependency ordering for patch
-1 ❌ mvninstall 0m 11s /patch-mvninstall-hadoop-common-project_hadoop-common.txt hadoop-common in the patch failed.
-1 ❌ compile 0m 33s /patch-compile-root-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt root in the patch failed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.
-1 ❌ javac 0m 33s /patch-compile-root-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt root in the patch failed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.
-1 ❌ compile 0m 30s /patch-compile-root-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt root in the patch failed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.
-1 ❌ javac 0m 30s /patch-compile-root-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt root in the patch failed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 3m 30s /results-checkstyle-root.txt root: The patch generated 24 new + 69 unchanged - 0 fixed = 93 total (was 69)
-1 ❌ mvnsite 0m 12s /patch-mvnsite-hadoop-common-project_hadoop-common.txt hadoop-common in the patch failed.
+1 💚 xml 0m 3s The patch has no ill-formed XML file.
-1 ❌ javadoc 0m 12s /patch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt hadoop-common in the patch failed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.
-1 ❌ javadoc 0m 12s /patch-javadoc-hadoop-common-project_hadoop-common-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt hadoop-common in the patch failed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.
+0 🆗 spotbugs 0m 13s hadoop-project has no data from spotbugs
-1 ❌ spotbugs 0m 12s /patch-spotbugs-hadoop-common-project_hadoop-common.txt hadoop-common in the patch failed.
-1 ❌ shadedclient 1m 1s patch has errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 0m 12s hadoop-project in the patch passed.
-1 ❌ unit 0m 11s /patch-unit-hadoop-common-project_hadoop-common.txt hadoop-common in the patch failed.
+1 💚 asflicense 0m 27s The patch does not generate ASF License warnings.
130m 7s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3612/2/artifact/out/Dockerfile
GITHUB PR #3612
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell xml spotbugs checkstyle
uname Linux 9194c18eb7ec 4.15.0-142-generic #146-Ubuntu SMP Tue Apr 13 01:11:19 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 4265294
Default Java Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3612/2/testReport/
Max. process+thread count 518 (vs. ulimit of 5500)
modules C: hadoop-project hadoop-common-project/hadoop-common U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3612/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@viirya
Copy link
Member Author

viirya commented Nov 3, 2021

Hmm, for the test, I built hadoop-lzo 0.4.21-SNAPSHOT locally with native-lzo library for Mac OS. Switched to 0.4.20 for CI to see if it works.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 56s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 13m 14s Maven dependency ordering for branch
+1 💚 mvninstall 24m 12s trunk passed
+1 💚 compile 23m 37s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 21m 48s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 checkstyle 3m 52s trunk passed
+1 💚 mvnsite 2m 0s trunk passed
+1 💚 javadoc 1m 31s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 2m 2s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+0 🆗 spotbugs 0m 32s branch/hadoop-project no spotbugs output file (spotbugsXml.xml)
+1 💚 shadedclient 24m 23s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 31s Maven dependency ordering for patch
+1 💚 mvninstall 1m 10s the patch passed
+1 💚 compile 26m 28s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javac 26m 28s the patch passed
+1 💚 compile 21m 25s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 javac 21m 25s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 4m 11s /results-checkstyle-root.txt root: The patch generated 24 new + 69 unchanged - 0 fixed = 93 total (was 69)
+1 💚 mvnsite 2m 9s the patch passed
+1 💚 xml 0m 3s The patch has no ill-formed XML file.
-1 ❌ javadoc 1m 5s /patch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt hadoop-common in the patch failed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.
-1 ❌ javadoc 1m 38s /results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt hadoop-common-project_hadoop-common-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu120.04-b10 with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu120.04-b10 generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0)
+0 🆗 spotbugs 0m 31s hadoop-project has no data from spotbugs
+1 💚 shadedclient 25m 37s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 0m 26s hadoop-project in the patch passed.
-1 ❌ unit 17m 46s /patch-unit-hadoop-common-project_hadoop-common.txt hadoop-common in the patch passed.
+1 💚 asflicense 0m 52s The patch does not generate ASF License warnings.
228m 17s
Reason Tests
Failed junit tests hadoop.io.file.tfile.TestTFileLzoCodecsStreams
hadoop.io.file.tfile.TestTFileLzoCodecsByteArrays
hadoop.io.compress.TestCodec
hadoop.io.file.tfile.TestTFileSeqFileComparison
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3612/3/artifact/out/Dockerfile
GITHUB PR #3612
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell xml spotbugs checkstyle
uname Linux 158ecb1201cb 4.15.0-142-generic #146-Ubuntu SMP Tue Apr 13 01:11:19 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / e7d98ab
Default Java Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3612/3/testReport/
Max. process+thread count 1261 (vs. ulimit of 5500)
modules C: hadoop-project hadoop-common-project/hadoop-common U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3612/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@viirya
Copy link
Member Author

viirya commented Nov 3, 2021

Hmm, seems not work.

[ERROR] testLzoCodec(org.apache.hadoop.io.compress.TestCodec)  Time elapsed: 0.004 s  <<< ERROR!
java.lang.RuntimeException: native-lzo library not available

I checked hadoop-lzo-0.4.20.jar. It includes native library for Linux-amd64-64 target, e.g.,

native/Linux-amd64-64/lib/libgplcompression.a
native/Linux-amd64-64/lib/libgplcompression.so

Maybe the CI is not the target?

So seems we cannot run the comparison test between the GPL lzo codec and this lzo codec on Hadoop CI.

I have run them locally to verify the comparison. The reviewers may build and install GPL lzo locally to run the test.

Once the reviewers think it is okay, I will remove GPL lzo codec stuffs.

@sunchao @dbtsai

@sunchao
Copy link
Member

sunchao commented Nov 3, 2021

Thanks @viirya , I'll take a look soon.

@sunchao
Copy link
Member

sunchao commented Nov 3, 2021

@viirya can you share how to test this locally? I got the same error as above and curious why it didn't work even hadoop-lzo is already a test dependency.

@viirya
Copy link
Member Author

viirya commented Nov 3, 2021

@sunchao It doesn't include native library for Mac OS X. So we need to built it.

I built and installed this https://github.com/twitter/hadoop-lzo locally. You may need to revert e7d98ab to use 0.4.21-SNAPSHOT version built from the source.

@sunchao
Copy link
Member

sunchao commented Nov 3, 2021

Cool thanks. I verified locally and the relevant tests in TestCodec all passed.

@viirya
Copy link
Member Author

viirya commented Nov 3, 2021

Thanks @sunchao . So I think we verified the new Lzo compressor and GPL Lzo compressor.

@sunchao
Copy link
Member

sunchao commented Nov 4, 2021

Yes looks OK so far. For reference, in the PR description could you add what other projects are using aircompressor for the same purpose (it'd be nice if you have a link to the related code)?

Feel free to update the PR and mark it as ready for review.

@viirya
Copy link
Member Author

viirya commented Nov 18, 2021

Thanks @sunchao !

I think the mostly famous usage of aircompressor is trino. Trino uses the library for its Lz4Codec, LzoCodec, SnappyCodec, etc. The code link is:

https://github.com/trinodb/trino/blob/fe608f2723842037ff620d612a706900e79c52c8/lib/trino-rcfile/src/main/java/io/trino/rcfile/AircompressorCodecFactory.java

I also updated in the description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants