-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arrow: Bump Apache Arrow 7.0.0 #4112
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looked like a big change jumping from 6.0.0 to 7.0.0, but it looks like the only thing we missed in between was a 6.0.1 release at the end of 2021.
Thought I'd leave that for anybody who, like me, initially looks at this and thinks this is a huge change. We only "missed" one minor patch version. It should be just the combination of 6.0.1 and what's listed in the changelog in the PR description (which isn’t trivial, but the PR description change log should cover most things).
We'll want to run the benchmarks to make sure there's no performance regression before committing this. You should be able to do that using the actions that @nastra set up. |
@rymurr, @RussellSpitzer, @emkornfield, any concerns with this update for the 0.14.0 release? |
I probably wouldn't even be averse to doing this for at 13.x release. +1 |
LGTM, FWIW, Spark also just upgraded on Master. I'm not sure about the other engines. as an FYI One of the more recent features in Java arrow is bindings to the C++ Parquet Dataset reader, which reportedly is faster then parquet-mr in some cases (not exactly sure how iceberg is using Arrow Java) |
I guess my only concerns are potential dependency "hell" with any consumers of Iceberg (I haven't had to delve into this in Java for quite some time) |
Thanks, @emkornfield! It should be okay because we shade Arrow to avoid conflicting with Spark and other engines. |
Thanks, @pan3793! |
This PR upgrades Apache Arrow version to 7.0.0, to be consistent with Spark & Boson. It's the same as the OSS PR: apache#4112 here.
To pick up new improvements & bug fixes from the latest release.
Release Notes: https://arrow.apache.org/release/7.0.0.html
Benchmark result
This PR Arrow 7.0.0
https://github.com/pan3793/iceberg/actions/runs/1844723231
Master branch Arrow 6.0.0
https://github.com/pan3793/iceberg/actions/runs/1844725462