-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23654][BUILD] remove jets3t as a dependency of spark #21146
[SPARK-23654][BUILD] remove jets3t as a dependency of spark #21146
Conversation
Test build #89800 has finished for PR 21146 at commit
|
As promised, dependencies fail
|
Seems OK to remove if it's really not being used, except that it means it's not leaking onto the user classpath anymore, right? that is, this can potentially cause user code to stop working if this goes in 2.4, or am I missing something? Could go in for 3.0. The other dependency changes seem OK. |
It's not going to stop user code from working as the bouncy castle version on the classpath means that Jets3t doesn't actually work. The fact that nobody has complained about this must count as a metric of how many people use jets3t :) More of an issue is the fact that the httpclient in 2.3+ isn't compatible with the AWS SDK in Hadoop 2.8.x. ... |
The transitive dependencies are a separate issue. Jets3t pulls in 3 JARs which nothing else seems to need, but which transitively go onto the spark CP downgraded
missing
|
It seems reasonable to proceed to remove this, and just reflect the rest of the dep changes accordingly. I wonder if updating bcprov actually makes it work or something? Or, could one say that updating bouncy castle at some point was an error that caused this to stop working? if it was long ago, well, maybe no use in undoing it. But if it happened very recently maybe we need to revisit. |
There's usually good reasons for upgrading crypto stuff like bouncy castle; nothing to feel bad about. How about I take this patch & add the explicit activation 1.1.1 ref to reinstate it, leave the rest out? |
I guess leave a note about why the activation artifact is managed there. Safer to keep it I guess. That artifact will almost surely not change. |
BTW, the activation framework (primariy used for some mime type stuff) is still being developed, now on github at @javaee https://github.com/javaee/activation. At least it is being maintained |
OK, I've reinstated javax.activation 1.1.1 as an export from spark core (over v 1.1), point to this JIRA in the comments, and updated the -deps lists to remove the others. Removing jets3t is the right thing to do. It's never going to work again & while the AWS SDK is equally troublesome to keep up to date, the hadoop-aws's move to a shaded JAR removes transitive dependency conflict as a source of friction and pain. |
pom.xml
Outdated
@@ -141,7 +141,7 @@ | |||
<codahale.metrics.version>3.1.5</codahale.metrics.version> | |||
<avro.version>1.7.7</avro.version> | |||
<avro.mapred.classifier>hadoop2</avro.mapred.classifier> | |||
<jets3t.version>0.9.4</jets3t.version> | |||
<javax.activation.version>1.1.1</javax.activation.version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine; if it's a version only ever used once, I also find it OK to inline to avoid one level of indirection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll inline. It's not like its one of those dependencies that anyone doing their own build will ever want to change.
dev/deps/spark-deps-hadoop-2.6
Outdated
@@ -21,8 +21,6 @@ automaton-1.11-8.jar | |||
avro-1.7.7.jar | |||
avro-ipc-1.7.7.jar | |||
avro-mapred-1.7.7-hadoop2.jar | |||
base64-2.3.8.jar | |||
bcprov-jdk15on-1.58.jar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, so bouncy castle is also removed? Maybe I miss what bcprov is, but I think it's the "bouncycastle provider". If so, the license in licenses-binary/ and in LICENSE-binary should be removed. And there's an entry separately in the parent POM that brings this in on account of jets3t, it says. That can go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if there's nothing else for bouncy castle then it's cleaner for classpaths if it goes, as for imports.
looking at Hadoop, it it pulls in bcprov-jdk16, but only for testing. and with out any explicit calls on bouncy castle APIs that I can see. Presumably its registered as a crypto
<dependency>
<groupId>org.bouncycastle</groupId>
<artifactId>bcprov-jdk16</artifactId>
<version>1.46</version>
<scope>test</scope>
</dependency>
If it is going, I should update this PR & JIRA name to make clear it is gone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like it's already been removed, so the other 'remnants' should go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@steveloughran I'll merge this if you can remove the bouncy castle license info and take one more pass at comments here.
Test build #92614 has finished for PR 21146 at commit
|
@steveloughran since I wanted to expedite this -- it relates to ECCNs which I'm fixing up -- I opened #22081 WDYT? |
965c73c
to
9adc0ec
Compare
closing as #22081 supercedes it |
Test build #94702 has finished for PR 21146 at commit
|
FYI, I just did a kinesis test run with this PR on a JVM with the unlimited JCE installed (explicitly verified by shasum of the relevant JARs); failure with cert errors.
|
And if you bump up the kinesis client and AWS SDK version to 1.11.271, those failures go away.
|
This also updates the dev/deps dependency files so it is not expected Change-Id: I15798710925d378de97523f7f89dbe5bd1cc8582
…e dependencies so the version it transitively exports is the same as the one it used to when jets3t was on the import graph. Remove the other JARs from the deps list. Change-Id: If22bb1b26381e2e8a3df050a535d644f45d306fe
Change-Id: I1298d6c3063ece8bc86575d781a85ba5309cda39
…cense files Change-Id: Iaf2e87ca57d46592551cfdbe6d5f8b419a92bcae
…k to match Change-Id: Ibf4162fca33189086bef234b6752f403a06aa7b0
853ee1c
to
3ac1d63
Compare
Test build #94770 has finished for PR 21146 at commit
|
Test build #94769 has finished for PR 21146 at commit
|
Test build #4258 has finished for PR 21146 at commit
|
Closing now we have a test run with the combination of: No jets3t, no bouncy castle, upgraded kinesis. all the kinesis tests now run |
…ions This PR has been superceded by apache#22081 ## What changes were proposed in this pull request? Increment the kinesis client, producer and transient AWS SDK versions to a more recent release. This is to help with the move off bouncy castle of apache#21146 and apache#22081; the goal is that moving up to the new SDK will allow a JVM with unlimited JCE but without bouncy castle to work with Kinesis endpoints. Why this specific set of artifacts? it syncs up with the 1.11.271 AWS SDK used by hadoop 3.0.3, hadoop-3.1. and hadoop 3.1.1; that's been stable for the uses there (s3, STS, dynamo). ## How was this patch tested? Running all the external/kinesis-asl tests via maven with java 8.121 & unlimited JCE, without bouncy castle (apache#21146); default endpoint of us-west.2. Without this SDK update I was getting http cert validation errors, with it they went away. # This PR is not ready without * Jenkins test runs to see what it is happy with * more testing: repeated runs, another endpoint * looking at the new deprecation warnings and selectively addressing them (the AWS SDKs are pretty aggressive about deprecation, but sometimes they increase the complexity of the client code or block some codepaths off completely) Closes apache#22099 from steveloughran/cloud/SPARK-25111-kinesis. Authored-by: Steve Loughran <stevel@hortonworks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>
## What changes were proposed in this pull request? Remove jets3t dependency, and bouncy castle which it brings in; update licenses and deps Note this just takes over #21146 ## How was this patch tested? Existing tests. Closes #22081 from srowen/SPARK-23654. Authored-by: Sean Owen <srowen@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>
Remove jets3t dependency, and bouncy castle which it brings in; update licenses and deps Note this just takes over apache#21146 Existing tests. Closes apache#22081 from srowen/SPARK-23654. Authored-by: Sean Owen <srowen@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>
What changes were proposed in this pull request?
With the update of bouncy-castle JAR in Spark 2.3; jets3t doesn't work any more; hence the s3n
connector to S3 is dead. Only one person has noticed so far. The hadoop s3n connector is never going to be updated to work with a later version of jets3t, instead the code has just been cut from hadoop 3.x entirely..
This patch remove the declarations of jets3t from the POMs which include it (root and spark-core), so it is not being packaged up.
Of the transitive dependencies, one is pinned at the same version, the others removed.
How was this patch tested?
Seeing what jenkins has to say about the missing/changed dependencies.