-
Notifications
You must be signed in to change notification settings - Fork 53
Update to Hadoop 3.3.5 #44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
676fd46
to
cd71fb8
Compare
@zielmicha I'll be appreciated if you could help on testing this PR. Also, it looks like all the binaries need to built by a project maintainer (#27 (comment)). We'll need some help from @electrum |
I made some small fixes to make main Trino repo compile here: I tried to test it using test Trino/Hive cluster and I'm getting weird errors - NoClassDefFoundError for |
javax.annotationzielmicha@4ff5ab6#diff-9c5fb3d1b7e3b0f54bc5c4182965c4fe1f9023d449017cece3005d3f90e8e4d8R597-R600 okhttphttps://mvnrepository.com/artifact/com.squareup.okhttp/okhttp kotlin-stdlib
Somehow Kotlin has been included in KerberosUtilKerberosUtil has been removed since the patches are already included in 3.3.1 @zielmicha I have successfully make queries on Hadoop 3.3.x cluster. |
javax.annotation: |
Ok. I added the relocation rule for javax.annotation. I removed |
After the last round of changes, I no longer see any errors at runtime. My guess is this problem was related to KerberosUtil ( I'll do few more manual tests (note the configuration I have does not use HDFS, so I don't really test "against" Hadoop, just confirming that the libraries work okay internally). I'll also run the regression tests suite. The okhttp3 rule ( |
I've made another small change to pom.xml: zielmicha@ca49d40 The regression tests in the main repo now pass. |
You are right. |
I tested the current version on a small test cluster with Hive connector (using NFS for storage, not HDFS). Not sure what other testing we should do, maybe it makes sense to request review from maintainers now? |
@oneonestar are you okay with requesting review from maintainers for your PR now? |
Sure. Let's go for it. |
cc @electrum |
@oneonestar @zielmicha version 400 io.trino.rcfile.TestRcFileReader failed "java.lang.NoClassDefFoundError: org/xerial/snappy/Snappy".
|
@KarlManong Thanks for the info. ORC Reader doesn't need |
<artifact>org.apache.hadoop:hadoop-auth</artifact> | ||
<excludes> | ||
<exclude>org/apache/hadoop/security/authentication/util/KerberosUtil.class</exclude> | ||
<exclude>org/apache/hadoop/security/authentication/util/KerberosUtil$*.class</exclude> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the subsequent filter for org.apache.hadoop:hadoop-azure
needs update. I got a class not found when running TestDeltaLakeAdlsConnectorSmokeTest
locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the filter for org.apache.hadoop:hadoop-azure
.
However, I can't run TestDeltaLakeAdlsConnectorSmokeTest
because I don't have access to azure-abfs.
@oneonestar hello! Is there any motion on this PR? There are erros HadoopIllegalArgumentException: Invalid buffer, not of length X" when querying Hive erasure coding tables when using Trino with Apache Hadoop version 3.2.0. Error mentioned before is fixed in Apache Hadoop version 3.3.0 EC : Decoding is failing when block group last incomplete cell fall in to AlignedStripe however, Trino is still using old version of Apache Hadoop. Is there any way we can push this forward? |
912a892
to
43a45a1
Compare
@mladjan-gadzic We faced the same problem before. We applied this patch to our internal Trino cluster and it works fine with EC. There are a few problems we have to solve before this PR can be merged:
|
@oneonestar thank you for a quick answer!
Unfortunately I am using macos with m1 chip which introduces whole new level of issues when tinkering around architecture dependent stuff. Because of this I am unable to compile native libs. But what I can do is I can check if someone from my organization can do that.
I will try to do this and get back to you. EDIT: how are Trino integration tests usually run?
What are the pros and cons of accepting kotlin-stdlib as transitive dependency? Are there any downsides for shading aside from what shading brings to the table itself? |
Hi @oneonestar! Just a remainder to check up my comment. I am eager to help push this PR forward. |
@oneonestar @electrum - any chance we could get some eyes on this? Thanks in advance! |
hadoop 3.3.5 has been released at 2023-03-15: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/release/3.3.5/CHANGELOG.3.3.5.html |
Hello @electrum, @oneonestar, @ebyhr - checking in - any progress? |
@oneonestar apologies for the very long delay on this. I updated your branch to use Hadoop 3.3.5 and copied the Hadoop native libraries from the official Hadoop releases (they have |
supersede #37
Changes
src/main/java/io/trino/hadoop/SocksSocketFactory.java
src/main/java/org/apache/hadoop/crypto/key/kms/KMSClientProvider.java
src/main/java/org/apache/hadoop/fs/FileSystem.java
final
keywordsrc/main/java/org/apache/hadoop/fs/ForwardingFileSystemCache.java
src/main/java/org/apache/hadoop/util/LineReader.java
io/trino/hadoop/TestHadoopNative.java
Removed files
src/main/java/org/apache/hadoop/security/authentication/util/KerberosUtil.java
src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.java
src/main/java/org/wildfly/openssl/OpenSSLProvider.java
pom.xml
Update slf4j to 1.7.36 in order to align with hadoop
https://github.com/apache/hadoop/blob/a585a73c3e02ac62350c136643a5e7f6095a3dbb/hadoop-project/pom.xml#L81
Add jsr305 for javax.annotation for JDK > 9
Add a dependency to org.lz4:lz4-java
The version is 1.7.1 in order to align with hadoop-project.
https://github.com/apache/hadoop/blob/a585a73c3e02ac62350c136643a5e7f6095a3dbb/hadoop-project/pom.xml#L146
Add a provided dependency to org.xerial.snappy:snappy-java.
This is caused by HADOOP-17125.
Since snappy-java is already a dependency for Trino, scope=provided is enough for the test to pass.
The version is 1.1.8.2 in order to align with hadoop-project.
https://github.com/apache/hadoop/blob/a585a73c3e02ac62350c136643a5e7f6095a3dbb/hadoop-project/pom.xml#L145
Add relocations for the new dependencies come from hadoop
TODO