-
Notifications
You must be signed in to change notification settings - Fork 38
[HADOOP-18342] shaded avro jar #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'd prefer retaining jackson unshaded just to avoid the need to do another release of this every time there's a jackson CVE emergency leaving out slf4j is good and flexible; though the dependencies of the shaded jar should declare that and compress as either provided or required. now, the parquet lib includes the version. I'm not sure that is the right thing to do (as it complicates that update), but it would allow someone brave to have the different versions coexisting. thoughts? |
Thanks @steveloughran - I'll look into changing the shaded avro jar and pom.xml as you suggest. I'm not sure if I understand your point about parquet. Do you think parquet needs to be shaded as an extra hadoop-thirdparty jar? Or does hadoop use parquet-avro jar and if so, would we need to shade parquet-avro too? |
it's that because the parquet bit of the shaded jar has a version in it, upgrading is harder than just bumping up the build number, i need to change the module name etc too. |
no problem, great that you are working on this. |
@steveloughran I have made some progress with the hadoop-avro jar. I still need to fully test it in a hadoop build though. I have noticed that hadoop also uses avro-maven-plugin to compile generated classes. It does appear that we will need to shade this too because at the moment avro-maven-plugin generates classes that import org.apache.avro classes. |
@steveloughran I think it's easier to keep using the standard avro-maven-plugin and to postprocess the generated source to replace the org.apache.avro package names in the generated source. I have this approach working in a local hadoop build. |
@steveloughran could you review this? apache/hadoop#4854 uses a temp jar I published myself but can be adjusted when this hadoop-thirdparty PR is merged and released. |
lgtm; needs a newline at the end of the new pom. |
do see #19 and discussion about whether artifacts need names. it may be good to use a version number here... |
I can rename the module and jar to hadoop-shaded-avro_1_11 if that makes sense. |
let's do that. yes, upgrading is harder, but it's a lot more obvious what is in use. (the other strategy would be to include it in the version itself, but that'd just confuse people) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
this is in; we need to get the protobuf update in and it'll be good for a release. though we may also want to update guava, which maybe we can do by adding a guava artifact with a version... |
https://issues.apache.org/jira/browse/HADOOP-18342
This is based on latest avro jar (1.11.1). The shaded avro jar also contains a shaded copy of jackson (based on version avro needs). This is jackson 2.12.7 which matches what hadoop uses but I think it might be safer not to rely on avro and hadoop needing the same version of jackson and jackson upgrades can prove problematic for downstream projects.
I have omitted these jars from the shaded jar:
These 2 are more stable and might be tidier not to completely bloat the avro shaded jar.