-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[WIP][SPARK-27733][CORE] Upgrade Avro to 1.9.2 #27609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ok to test. |
Test build #118608 has finished for PR 27609 at commit
|
Hi, @iemejia . Could you fix the PR description? This PR is about
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, for the build failure, please do the following in your branch to update the dependency manifest.
dev/test-dependencies.sh --replace-manifest
d4b1bda
to
f667d62
Compare
Thanks @dongjoon-hyun I am new to the Spark dev process so any hints/feedback is greatly appreciated. |
Test build #118628 has finished for PR 27609 at commit
|
Test build #118636 has finished for PR 27609 at commit
|
Hive is still depending on an older version of Avro:
|
Yes @Fokkko and that's something I don't know if we can workaround, because this is a transitive dependency. That's the whole point of my arguments to upgrade the dependency in Hive and catch with the upgraded version here on Spark HIVE-21737. But then we will still need the same kind of fix for the fork of Hive 1.x that Spark depends on (I don't even know where the code of that one is). Or am I a missing a better fix? |
Thank you for making a PR, @iemejia . |
Thanks @dongjoon-hyun my goal with this PR was to share the work and show you the issues. Since this is definitely out of my hands as you can see. I hope you or someone else in the Spark PMC has contacts with the Hive people to see if we can untangle this mess together. Don't hesitate to ping me if you need me to reopen this or join that discussion. |
Thanks, @iemejia . |
Any updates for avro version in spark? |
We have a recent progress, we need still a release of the hive dependencies if you want to follow the 'action' more details here https://issues.apache.org/jira/browse/HIVE-21737 |
@iemejia We can now upgrade Avro since the built-in Hive has been upgraded to 2.3.8. |
+1 for @wangyum 's comment. |
Thanks for the awareness @wangyum I am going to reopen and rebase this, let's see... |
What changes were proposed in this pull request?
This PR upgrade parquet to 1.11.0.
Why are the changes needed?
Because Spark lags behind major improvements and cleanups in Avro and also it can remove some extra dependencies (e.g. paranamer and maybe the old versions of jackson that have security vulnerabilities and are still present on Avro 1.8.x and 1.7.x).
Also Parquet 1.11.0 needs Avro 1.9.x to run so we can get more issues because of it. For ref. #26804
Does this PR introduce any user-facing change?
Unknown
How was this patch tested?
Partially, some parts are still failing so this is for the moment a WIP to get some feedback specially in the situation of the transitive Hive dependencies. PTAL at the Jira ticket SPARK-27733 for more details on the dependencies issues.
What has been done so far?
code on Spark should be using this one directly.
Also the dependency of avro-mapred depends on avro-ipc tests was removed in
1.8.x so probably that's still there for compatibility with old Hive
jackson-module-paranamer does not exist anymore in the code base (Note I could not get rid of the paranamer deps in
dev/deps
because it is coming transitively from jackson-module-scala_2.11.